json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
50 stars 9 forks source link

Schema evolution #5

Open handrews opened 6 years ago

handrews commented 6 years ago

This was originally filed by @cavanaug as https://github.com/json-schema-org/json-schema-spec/issues/285, where it originally referenced Avro's "aliases" as a starting point. This is probably the most concise explanation from @cavanaug:

Avro schema evolution isnt perfect but it basically allows for field addition, deletion & renaming. So it is in essence only for simple syntactical evolution. But in many cases having that level of flexibility is pretty useful.

Avro use cases are often outside of a client/server request model in that they are more apt to exist in data processing flows (ala pub/sub systems like Kafka/Kinesis) where producers & subscribers may evolve at a different pace and happen outside of a content negotiation style model.

Advanced evolution and semantic evolution in my mind fall outside of a declarative syntax model. Which is sort of why I said Avro isnt perfect. In data processing systems it gives you a few extra capabilities, but when those are exhausted you are back to writing some type of "normalization" code as part of a data flow.

I've often longed for some of the Avro capabilities in Json to handle those situations where simple syntactical evolution is all I would need. It provides a declarative model I can use without resorting to custom normalization code.

@cavanaug also said:

I know a lot of people here are focused more on the json web angle, dont discount the heavy usage of json in data environments and the potential growth there for jsonschema usage.

This may fit as part of an "API Documentation" vocabulary, although whether that is really web APIs in the specific sense or just "system where you interact with stuff that changes over time" is an open question. I take a pretty broad view of "API" so any sort of data processing system could fit (maybe the vocabulary needs a different name- we're still sorting out how and where this sort of thing should live and relate to existing vocabularies).

handrews commented 6 years ago

Here's my last significant comment on that issue:

For transforming JSON representations from one version to another, I would look at JSON Patch. It can express such concepts as "rename field X to Y", as well as adding and removing fields, and even limited conditionals by testing field values.

I guess the question then would be whether and how to integrate such a usage of JSON Patch into JSON Schema.

There's really no notion of versioning built into JSON Schema / Hyper-Schema. Each schema exists on its own. A system designer can indicate that a set of schemas form a sequence of versions, but that's an entirely external concept.

lennygran commented 6 years ago

I would like to comment on this and closed issue #285, as well as to check on status of this issue.

I feel like conversation in #285 went an unexpected direction - to XSLT and transformation, that is unlikely a purpose of field alias. The purpose(s) is more:

  1. To provide reliable way for producer to evolve its domain schema without tightening up consumer to its language. Producer leaves consumer an option to have its own, still "compatible", but not exactly equal schema.
  2. To evolve its own schema with new language without forcing consumer domain to change their code - immediately or ever.

This is especially important using messaging patterns integrating multiple domains. Additionally, there is a scenario replaying old events; producer might not even support an event (schema) any longer, but consumer may want to replay it after domain has evolved. I would describe the functionality as "Ingest data AS", instead of just delivering evolutionary changes as JSON Patch does.

Currently JSON Schema allows to handle deletes and field addition as perfectly pointed out in #285, but unfortunately does not provide a clean way to rename a field. Adding an alias will solve this problem in a very clean fashion. A payload deserializer (e.g. JSON) may take advantage of alias information to perform proper mapping.

@handrews, can you please re-evaluate this option considering possible inclusion? It will help many people to solve schema evolution problems as Avro was solving it for years.

handrews commented 4 years ago

Re-opening this as it was not being tracked anywhere else- the issue in the other repo was closed.

ottomata commented 3 years ago

FWIW Wikimedia is doing this manually with some tooling: https://github.com/wikimedia/jsonschema-tools