Figure out a way of supporting tests in Iglu

alexanderdean commented 10 years ago

Fred wrote some nice tests for the core Snowplow schemas when these were a part of snowplow/snowplow. These can be seen here:

https://github.com/snowplow/snowplow/tree/40a5037563e729c67a922a3e2e67c4e5bb917809/0-common/schemas/jsonschema/tests

Fundamentally, tests divide into:

Good tests - pass validation
Bad tests - fail validation

So we need to think about how to store tests inside of Iglu. Starter for 10 - what about:

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good com.snowplowanalytics.self-desc/instance/jsonschema/1-0-0/tests/bad

The idea is that tests that are good for 1-0-0 must also (by definition) be good for 1-0-1, 1-0-2 etc. Tests which are bad for 1-0-0 could be good for 1-0-1 so there's nothing we can reason about there.

@fblundun thoughts?

alexanderdean commented 10 years ago

I guess it would be like this:

tests/good/1 tests/good/2 tests/bad/1 tests/bad/2 etc

and tests/good would return all the good tests.

ALSO: once we have tests/good/1-0, we can validate a new schema upload to jsonschema/1-0-1 that all good/1-0 tests pass.

alexanderdean commented 10 years ago

I think we should use valid and invalid rather than good and bad, as per Fred's existing tests

alexanderdean commented 10 years ago

Probably use UUIDs instead of /1, /2 etc. Also allows to remove existing tests if we want to.

http://stackoverflow.com/questions/7114694/should-i-use-uuids-for-resources-in-my-public-api

fblundun commented 10 years ago

This all sounds good.

I wonder if it would be worth having a system where whenever a schema's version is bumped (e.g. from 1-0-0 to 1-0-1) we have a place for tests designed specifically to be validated by the new version but not by the old, to highlight the difference between the two.

The structure could be something like this:

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/3 would contain JSONs which should be validated by schema 1-0-x where x >= 3

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/3 would contain JSONs which should be rejected by schema 1-0-x where x <= 3

Then if we want to test schema 1-0-3, we check that it:

validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/y for all y <= 3
rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/y for all y >= 3

In fact we could alternatively do away with the "good" and "bad" distinction and just have com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/z containing all examples which should be validated by 1-0-z but not by 1-0-(z-1).

Then to test schema version 1-0-3, we would check that it:

validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w <= 3
rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w > 3

The disadvantage of this is that it might involve moving some test JSONs to a new directory when a new version is published, and that it's pretty complicated...

alexanderdean commented 10 years ago

Hey Fred, lots of great thoughts there. I think what we're saying is that fundamentally, for a given MODEL-REVISION, tests are either:

valid-from-ADDITION
invalid-until-ADDITION
invalid-forever

Is that a helpful taxonomy?

fblundun commented 10 years ago

I think that's a helpful way to think about it. In terms of file structure, we could group test JSONs into 2 categories: 1) invalid-forever, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/ 2) valid-from-ADDITION, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/ADDITION

Then to test schema 1-0-x we make sure it validates everything in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/y for y >=x, and that it invalidates every other test JSON in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/.

alexanderdean commented 10 years ago

Interesting! Possible simplification: we add all tests simply as:

com.snowplowanalytics.self-desc/instance/jsonschema/tests/f47ac10b-58cc-4372-a567-0e02b2c3d479

etc.

Then when you submit a new JSON Schema, all existing tests are run against it, and the response from the new JSON Schema registration contains a listing of all test stati.

Going further, when you request an individual test, it contains in its metadata which tests it succeeds against.

Going even further, there should be the opportunity to run a new potential schema against all tests without actually committing it.

alexanderdean commented 10 years ago

Assigning to @BenFradet

snowplow / iglu

Figure out a way of supporting tests in Iglu #1