Closed auphofBSF closed 1 year ago
What you say is completely right. Eventually, a change in the GSMA-commons(or location commons) would alter the referred schemas. But we have very good reasons for not changing it or that the change does not affect current models (i.e. a new attribute but keeping all old ones) In fact, our policy is that we never do backward incompatible versions. Could this happen in the coming future? Nothing is impossible but I cannot envision a reason to do it and of course never without deep consultation with the users and contributors. Anyhow we are open to adopt other additional mechanism to fix this.
Besides now we have a draft of a database of data models' versions so you can gather the moment when the extract was created. Grabbing the information from model.yaml (which has the $ref attributes) should be solved.
Regarding these two options 1) https://smart-data-models.github.io/data-models/common-schema.json#/definitions/GSMA-Commons or 2) https://raw.githubusercontent.com/smart-data-models/data-models/c4ee5d39bcbacdc30700bcd2d916aaf2c50dc86e/common-schema.json#/definitions/GSMA-Commons
1) in the model.yaml of the data model these references are brought so it is like your solution 2. Quite deterministic. is this a solution for you?
finally, what you could use is to point not to json schema (non-deterministic) but to the yaml version which is deterministic for every version.
I believe there is an issue on being deterministic about subschemas in a FIWARE SmartDataModel (SDM) model. An example being in
Device
where the subschema"$ref": "https://smart-data-models.github.io/data-models/common-schema.json#/definitions/GSMA-Commons"
can change without a user being awareThe issue I will demonstrate and discuss is the effect to resultant uses of a
schema.json
where the present version and configuration management of references of subschemas such asGSMA-Commons
is not deterministic .This is exemplified by an experience I just had and will detail further
I have an automated process where by I generate python pydantic model objects for any FIWARE SDM. This Pydantic Model is then used in to interact with or extend a legacy data structure, Through this mechanism I would like to offer our existing data to 3rd parties or new processes or consume data (with appropriate access) against a FIWARE SDM standard that is deterministic.
In the process of development and testing I regenerated one of these SDM Model as a pydantic object and ended up with a significantly different Model object. In contrast that over the prior days the pydantic model was being generated repeatedly and deterministically. The Models
schema.json
did not change , so I had to figure out what had changed !I believe it is changes in subschema that the $Refs point to. These subschema links are snapshots and not deterministic. I cannot identify what and when it changed, I can see that some backend activities must have happened. Unfortunately as with the subschema
commons-GSMA
even looking at the source repo, and looking at all the historygit log -p -- commons-gsma.json
I could not find a commit for that subschema that would have generated the pydantic model that I had commited 3 days prior.I assumed and believed the SDM schema.json of a particular commit would deterministically regenerate the pydantic model as long as generator and schema where the same. I now believe any users of any SDM schema are vulnerable to a subschema changes introducing potentially breaking changes
Configuration
Pydantic Model that has shown the problem is using the following Subject and DataModel
Pydantic Model generators has had no changes ( same Github commit)
I am accessing a
schema.json
forDevice
from a local instance of Repo https://github.com/smart-data-models/dataModel.Device.git I pulled this prior a few days ago to the then last commit and no updates done. My git log is still at 30th AugAnalysis
What Changed and How do we control for change ?
The model generator is under git version control, and confirmed no changes The model generator builds out from
schema.json
and obviously retrieves definitions from referenced subschema's.In the case of
Device
this rootschema.json
is uniquely versioned from the source repo. Hence my Pydantic Model object can be versioned to this schema commit https://github.com/smart-data-models/dataModel.Device/blob/f8c87c97cb8e1add4687a70b1f65bdd5409d706b/Device/schema.json#L3 there in most cases also a version attribute in the schema ie"$schemaVersion": "0.0.7",
Example of this
root schema.json
The root schema for
Device
can be deterministically versioned . An attribute can be assigned to my Pydantic Model being"SDMVersion": "0.0.7.f8c87c97cb8e1add4687a70b1f65bdd5409d706b",
<$schemaVersion>.<GitCommit>
and"SDMrootSchema": "https://raw.githubusercontent.com/smart-data-models/dataModel.Device/f8c87c97cb8e1add4687a70b1f65bdd5409d706b/Device/schema.json",
<$immutableSourceURL>
linking my pydatic model to a deterministic root schemaThe issue in the case of
Device
is that the subschemas introduced byare not deterministic, there is no versioning in the
$ref:
subschema's urls. Any change to a referenced schema could introduce a breaking change to a pydantic model derived from the rootschema.json
The issue then becomes where should this versioning managed. Any change in the nth subschema will necessitate a new dependency driven version increment in all associated parent schemas above this nth tier subschema change
Options
Centrally control dependency management for referenced subschemas. This I believe would need to in the FIWARE SDM Subject Repo ie the
$ref
changes fromhttps://smart-data-models.github.io/data-models/common-schema.json#/definitions/GSMA-Commons
tohttps://raw.githubusercontent.com/smart-data-models/data-models/c4ee5d39bcbacdc30700bcd2d916aaf2c50dc86e/common-schema.json#/definitions/GSMA-Commons
This would be my preferred optionAll Parent Schemas for a changed subschema need to have the $ref updated and Parents schema committed with new subschema changes. - Change control and version dependency management is easily visible to all consumers
Migrations can be planned
Viability depends if this is human or script managed. Some bearing on how deep the tree of reference subschemas can be but not impossible for an algorithm to process. Without a deep look into all models, I manually viewed the SDM
dataModel.Device/Device
and particularly the definitions/GSMA-Commons. There does not appear to be any schema references external to this $ref .However some of the $Refs are Relative and some are full URLS. ideally they should all be relative see
From https://raw.githubusercontent.com/smart-data-models/data-models/c4ee5d39bcbacdc30700bcd2d916aaf2c50dc86e/common-schema.json
In the absence of Option 1 dependency management done by the user of a schema. A map is generated for all referenced schema's including subschemas. These are pulled and versioned. The user then changes all references to the appropriate fixed dependencies. The users maintains version management, resultant schema may not be able to be shared. Potential Issue's : no central version registry, incompatibility between objects in 2 different systems that don’t have access to common repositories