snowplow / iglu

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow
http://www.snowplow.io
Apache License 2.0
208 stars 45 forks source link

Design: consider version fallback #473

Open chuwy opened 5 years ago

chuwy commented 5 years ago

Right now if client requests 1-0-1, but it does not exist, entity just fails with "schema not found". @dilyand is wondering if instead it can fallback to 1-0-0.

A counterargument: client defined 1-0-0 schema with some keys. Then sends an event with 1-0-1 unkown bar key of length 30, it falls back to 1-0-0. Then client uploads 1-0-1 with bar maxLength 10. We have a falsely valid entity.

A counterargument to conterargument: if 1-0-0 validates this unknown bar it must have additionalProperties: true. In that case next schema with defined bar should not be an ADDITION.

alexanderdean commented 5 years ago

What's the use case for falling back to the earlier schema? If 1-0-8 is missing and we look for 1-0-7 and that is not there, do we keep looking for even earlier versions? When do we stop?

dilyand commented 5 years ago

This is a brief of the use case by @architgoyal1:

In the following case it seems a reboot of Stream Enrich is needed:

  1. upload schema 1-0-0 to iglu
  2. send events to validate against 1-0-0
  3. send events to validate against 1-0-1
  4. upload schema 1-0-1 to iglu
  5. continue sending events to be validated against 1-0-1

(1-0-1 could be 1-1-0 or 2-0-0)

Without a reboot to Stream Enrich, the events in steps 3 and 5 would all fail validation, rather than being validated against schema 1-0-0

(See this issue https://github.com/snowplow/iglu/issues/473)

alexanderdean commented 5 years ago

Isn't that problem just solved by the default expiry of the caching of missing schemas?

dilyand commented 5 years ago

The caching issue -- yes; but people sending events before deploying schemas is rather common.

On the other hand, with Event Recovery 0.1.0 now out, those failures should be easier to recover from...