Open mpharrigan opened 3 years ago
My thoughts: I can imagine a case where an under-used hard-to-maintain-compatibility old class could lose backwards-compatible-loadability eventually, but the goal should be keep things around where possible. Especially with a results class.
It's annoying when reviewing a PR that renames the existing json test data to _inward and adds a new suffixless version because it's unclear from the diff if they did it right and we can't easily add tooling-based protection.
I would suggest investigating a scheme where we never modify / delete existing test data and just add Result.2.json Result.3.json, etc. If we need the inward mark for the testing to work, we can record that in the json testing python file or try to add some sort of logic where if a .2.json exists then the suffixless version is inward-only
Breaking stored experimental data is 100% unacceptable. We must never ever break backwards compatibility with serialized data. This is not negotiable, it is a hard requirement of serialization to disk. I feel rather strongly about this point. If I come back 30 years from now and try to load my janky old cirq circuits, they had better load.
That being said, the serialized data may get parsed into different equivalent objects (e.g. we may replace stored OldLameMeasurementGate with in-memory FancyNewMeasurementGate, but the resulting circuit must behave identically).
We can add more json test data, but we must never modify or remove json test data.
In a previous project of mine we implemented versioning for configuration data, and we implemented unit tested migration scripts between each version increments. As truly breaking changes were rare, this helped enabling 2-3 year old versions of configurations to be seamlessly migrated to the latest version. This upgrade could happen on the fly as well by default.
With the assumption that truly breaking ("non-migratable") changes are super rare (e.g. introduction of a new mandatory field with no default value), we can implement a similar scheme for each type that we support. I would make the mapping explicit though between the different versions.
The way the resolver dictionary is set up, the value is effectively the on-the-fly migration you describe. It just so happens it usually is a (renamed) constructor
Yes, exactly. However,
If you define _from_json_dict_
on the class, you can have ultimate control over expressiveness. It will just pass the raw json dict to it and you can construct and return whatever you want from it. I will concede it could be more explicit that a migration is happening. A combination of 1) noticing that the string key in the resolver dict does not match the class name and 2) comments in the _from_json_dict_
method could go a long way
Great point! That can work in all the cases where you keep evolving the same class - which should be 99.99% of the cases. If a class is refactored into two classes though, then we might still need an external migration piece.
Yes that's true. Just so we're clear about what the current situation is: in the split-a-class migration you'd choose one of the new classes to be the chosen one in the resolver dictionary and its _from_json_dict_
classmethod would return the appropriate class, which may not necessarily be the class of the classmethod (if that makes sense), which is legal but probably discouraged. Or it would return e.g. a tuple of the two new classes if that's how things were split up.
Worth keeping in mind, but I don't think we've run into this yet and I'm not sure if we will in the near future
We should consider adding a documentation blurb about this on the "Import/Export circuits" or somewhere else before 1.0. If documentation is in place, we can remove pre-1.0.
Confusion about this came up in the 3/30/2022 cirq cync, strengthening the case for documenting this.
Happy to help here. What's the ask? I could:
I personally think keeping and maintaining a set of migration scripts is just as much developer/organizational burden as keeping the migration logic in the resolver so it can be done "on the fly".
If the above is truly what we want: we need to square it with #5483 which did not really follow this policy.
The ask for documentation is a good one and will be easier than the second part of the original post in this issue, which is to have better tooling to enforce the currently de facto policy that documentation would make de jure.
Specifically: we could have a CI check that mandates the json_test_data *.json files are never modified! This would involve reworking the current json_inward scheme where any given json file could be annotated as "inward" without triggering a git diff.
Inspired by #4318 @balopat wants a policy on how we support loading json data from deprecated or deleted classes. We expect data to live statically longer than actively maintained code bases.
Similarly, we should revisit our tooling support for maintaining JSON backwards compatibility. Currently, if the structure of a class is changed, the contributor should rename the existing test data to
name.json_inward
and add a newname.json
with the new structure. This procedure happens somewhat infrequently and isn't well known. It's hard to tell from GitHub diffs if it's been done correctly.