Use protocol buffers for serialization instead of dicts - Githubissues

quantumlib / Cirq

A Python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.

Apache License 2.0

4.28k stars 1.02k forks source link

Use protocol buffers for serialization instead of dicts #1156

Closed viathor closed 5 years ago

viathor commented 6 years ago

When code generated from protocol buffer defs was removed in #825, a number of serialization/deserialization functions that used them (e.g. sweep_to_proto()) were changed to return/accept dict instead. We have recently undone the removal (see #1085).

Should also undo the serialization/deserialization change? Protocol buffers offer stronger type safety than dict.

maffoo commented 5 years ago

I'd like to revisit this, because I'm currently dealing with some protobuf translation issues but want to get input before I put in too much work. I think it's nice to use the protobuf classes directly for some use cases (and we could even enable stub generation to get type checking of the messages). Also, there's an existing json_format module that provides MessageToDict and ParseDict functions, so that we easily translate between proto messages and python dicts (or json strings) in the canonical format without writing any additional code. Thoughts?

Strilanc commented 5 years ago

What do you see as the advantages of using the protobuf classes, instead of just dictionaries? These are serialized objects; we mostly interact with them in code that is converting into or out of the representation. Is that code noticeably difference if we use the protobuf classes instead of dicts?

I agree that outside of conversion code, when actually operating on the representation, you want more type safety than a dictionary.

maffoo commented 5 years ago

One issue is that right now we are hand-rolling the serialization code to try to be compatible with the canonical json representation of protobuf message, but we differ in some cases. For example, protos represented as json messages can omit fields that have default values, but in GridQubit we currently require that the row and col fields be present which does not agree with the canonical representation. That code is marked as deprecated, but it's symptomatic of trying to roll our own serialization code where we could just use the protobuf libraries.

I agree that from the perspective of a user there's almost no difference because as you say its just serialized data. But when implementing backend logic that receives values as protobuf messages, it's odd to have to convert to dicts first in order to be able to then deserialize into cirq objects. This backend logic does not even live in cirq, which is part of why I seem to be the only one running into issues around this :-)

maffoo commented 5 years ago

Another thing I'll say in favor of writing protobuf (de)serialization functions is there are far fewer magic strings in the code because we don't have to write the field names manually.

maffoo commented 5 years ago

Another wrinkle: the official JSON format specification for protobuf says that field names are mapped to lowerCamelCase unless a field is annotated with a json_name option. We are definitely not following that in our hand-rolled serialization code.

viathor commented 5 years ago

The datapath from cirq on a client's machine to the quantum computers in the cloud should be simple and easy to maintain.

Rolling our own JSON format and shoe-horning it into protobuf-compatible shape somewhere along the cirq-cloud datapath is highly likely to result in bugs, outages and tears.

Therefore, we should avoid hand-rolled serialization code. Since protobufs have a standardized JSON representation that comes with tested and battle-hardened implementation, using protobufs is the simplest choice with lowest maintenance cost down the road.

maffoo commented 5 years ago

Fixed by #1717