open-contracting / standard

Documentation of the Open Contracting Data Standard (OCDS)
http://standard.open-contracting.org/
Other
136 stars 46 forks source link

Remove all ocdsMerge stratagies #287

Closed kindly closed 7 years ago

kindly commented 8 years ago

I think there is a case for removing all OCDS merge stratagies and just documenting how the merging is expected to happen. I think it over complicates the schema and is more confusing to implementers.

Merging can be broken down into the following.

     [{"releaseID": <id>, "releaseTag": [<tag>,...], "releaseDate": <date>, "value": <value at release>}, ...] 
timgdavies commented 8 years ago

Related to #284

kindly commented 7 years ago

I suggest we replace all the merge strategies and just have the following two extra properties as booleans.

wholeListMerge

This will be used for:

omitWhenMerged

This will be used for:

In version 1.1 as organizations have a top level id, they should use normal merge rules.

mpostelnicu commented 7 years ago

are we still using omitWhenMerged and wholeListMerge described above ? I am having a sense that removing both ocdsVersion and arrayMergeById and treating all arrays in the same way may not work for some corner cases, also ocdsOmit for the release.date.

We have a full merging routine implemented for 1.0 that traverses the release trees and performs the merge based on metadata of each field (the type of merge strategy specified for each of the fields). This assumes zero knowledge of the documentation of how a merge needs to be performed as a whole but only knowledge about how each atomic merge operation is handled, thus quite easy to extend to more fields. As the standard can be extensible to allow customized implementation-specific fields, this is a plus because we dont have to change the merging routine, just make sure the new fields have the type of merging specified in their definition.

If we are planning to abandon this and have no field level metainformation about merging strategies, do we have in mind developing a separate detailed documentation page about how to handle the merge, what fields to omit what to include and what are the exceptions to the general merging rules?

kindly commented 7 years ago

@mpostelnicu The plan is to still do this.

The main rationale for this change is to make creating OCDS extensions easier for people who do not understand the merge strategies as they currently stand. It is a fairly large burden for any extension creator to make sure they understand them and use them correctly. To be honest I am surprised and impressed that anyone else outside the OCP team actually understood them at all!

So for clarity there will be only the two field level merge stratagies: omitWhenMerged (this is equivilent to ocdsOmit now) wholeListMerge (this is equivilent to ocdsVersion on an array of objects)

All other merge strategies will in essence be defaulted:

arrayMergeByID: for all arrays of objects that do not have wholeListMerge overwrite: for all id fields on arrays of objects that do not have wholeListMerge
ocdsVersion: for all other fields

So when people are writing extensions they do not have to worry about merge strategies at all for the most common use cases:

This of course should/will be documented.

I can see how this change is slightly harder to implement for people implementing merging (who themselves must be very technical) as it means inferring the merge strategies from the schema and not having every field explicit.

However, I think it is important to make it as easy as possible for domain specialists who are making the extension schemas. It is very difficult to describe what they mean at the moment. Also even for people who know what they are get them wrong, so defaulting the most common cases I think makes sense.

mpostelnicu commented 7 years ago

OK, as long as we are documenting this nicely, it should be fine.

timgdavies commented 7 years ago

I've put together updated merge documentation here: http://standard.open-contracting.org/1.1-dev/en/schema/merging/

kindly commented 7 years ago

After rewriting the script to make the versioned_validation_schema I have found that we need one more rule to say explicitly that we need to version a particular id field. This has no effect on the merge rules as outlined in the new documentation, but it should be mentioned in the versioned release schema section at the bottom.
So even though this is late notice for 1.1, for clarity I think we need to add another rule.

versionId

At the moment this is used in two places:

i.e it is used when an object, that is not in an array of objects, has an "id" field.

This new field is not strictly necessary (as it could be implied from schema by looking to see if the object is a single object or is in a list of objects) BUT this implication is very hard to write for anybody implementing versioning and an implicit marker is much easier.

The changes to the schema are here along with the new code that generates the version schema.

https://github.com/open-contracting/standard/pull/412/files#diff-0eaeaf6a8f844dc3725590c958f7c127

timgdavies commented 7 years ago

Thanks @kindly

I've added some additional documentation updates to the branch - and will look to merge this now.