quantumlib / Cirq

A Python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
Apache License 2.0
4.28k stars 1.02k forks source link

JSON backwards compatibility policy and tooling. #4321

Open mpharrigan opened 3 years ago

mpharrigan commented 3 years ago

Inspired by #4318 @balopat wants a policy on how we support loading json data from deprecated or deleted classes. We expect data to live statically longer than actively maintained code bases.

Similarly, we should revisit our tooling support for maintaining JSON backwards compatibility. Currently, if the structure of a class is changed, the contributor should rename the existing test data to name.json_inward and add a new name.json with the new structure. This procedure happens somewhat infrequently and isn't well known. It's hard to tell from GitHub diffs if it's been done correctly.

mpharrigan commented 3 years ago

My thoughts: I can imagine a case where an under-used hard-to-maintain-compatibility old class could lose backwards-compatible-loadability eventually, but the goal should be keep things around where possible. Especially with a results class.

It's annoying when reviewing a PR that renames the existing json test data to _inward and adds a new suffixless version because it's unclear from the diff if they did it right and we can't easily add tooling-based protection.

I would suggest investigating a scheme where we never modify / delete existing test data and just add Result.2.json Result.3.json, etc. If we need the inward mark for the testing to work, we can record that in the json testing python file or try to add some sort of logic where if a .2.json exists then the suffixless version is inward-only

Strilanc commented 3 years ago

Breaking stored experimental data is 100% unacceptable. We must never ever break backwards compatibility with serialized data. This is not negotiable, it is a hard requirement of serialization to disk. I feel rather strongly about this point. If I come back 30 years from now and try to load my janky old cirq circuits, they had better load.

That being said, the serialized data may get parsed into different equivalent objects (e.g. we may replace stored OldLameMeasurementGate with in-memory FancyNewMeasurementGate, but the resulting circuit must behave identically).

We can add more json test data, but we must never modify or remove json test data.

balopat commented 3 years ago

In a previous project of mine we implemented versioning for configuration data, and we implemented unit tested migration scripts between each version increments. As truly breaking changes were rare, this helped enabling 2-3 year old versions of configurations to be seamlessly migrated to the latest version. This upgrade could happen on the fly as well by default.

With the assumption that truly breaking ("non-migratable") changes are super rare (e.g. introduction of a new mandatory field with no default value), we can implement a similar scheme for each type that we support. I would make the mapping explicit though between the different versions.

mpharrigan commented 3 years ago

The way the resolver dictionary is set up, the value is effectively the on-the-fly migration you describe. It just so happens it usually is a (renamed) constructor

balopat commented 3 years ago

Yes, exactly. However,

mpharrigan commented 3 years ago

If you define _from_json_dict_ on the class, you can have ultimate control over expressiveness. It will just pass the raw json dict to it and you can construct and return whatever you want from it. I will concede it could be more explicit that a migration is happening. A combination of 1) noticing that the string key in the resolver dict does not match the class name and 2) comments in the _from_json_dict_ method could go a long way

balopat commented 3 years ago

Great point! That can work in all the cases where you keep evolving the same class - which should be 99.99% of the cases. If a class is refactored into two classes though, then we might still need an external migration piece.

mpharrigan commented 3 years ago

Yes that's true. Just so we're clear about what the current situation is: in the split-a-class migration you'd choose one of the new classes to be the chosen one in the resolver dictionary and its _from_json_dict_ classmethod would return the appropriate class, which may not necessarily be the class of the classmethod (if that makes sense), which is legal but probably discouraged. Or it would return e.g. a tuple of the two new classes if that's how things were split up.

Worth keeping in mind, but I don't think we've run into this yet and I'm not sure if we will in the near future

dstrain115 commented 2 years ago

We should consider adding a documentation blurb about this on the "Import/Export circuits" or somewhere else before 1.0. If documentation is in place, we can remove pre-1.0.

dstrain115 commented 2 years ago

Confusion about this came up in the 3/30/2022 cirq cync, strengthening the case for documenting this.

mpharrigan commented 2 years ago

Happy to help here. What's the ask? I could:

I personally think keeping and maintaining a set of migration scripts is just as much developer/organizational burden as keeping the migration logic in the resolver so it can be done "on the fly".

If the above is truly what we want: we need to square it with #5483 which did not really follow this policy.

mpharrigan commented 2 years ago

The ask for documentation is a good one and will be easier than the second part of the original post in this issue, which is to have better tooling to enforce the currently de facto policy that documentation would make de jure.

Specifically: we could have a CI check that mandates the json_test_data *.json files are never modified! This would involve reworking the current json_inward scheme where any given json file could be annotated as "inward" without triggering a git diff.