Cannot serialize Alias Sampling

mstechly commented 11 months ago

I was trying to take serialize Alias Sampling:

import numpy as np
from qualtran.serialization.bloq import bloqs_to_proto
from qualtran.bloqs.state_preparation import StatePreparationAliasSampling

coeffs = np.array([1.0, 1, 3, 2])
mu = 3

state_prep = StatePreparationAliasSampling.from_lcu_probs(
    coeffs, probability_epsilon=2**-mu / len(coeffs)
)
bloqs_to_proto(state_prep)

However I got the following error: ValueError: Cannot serialize (SelectionRegister(name='selection', bitsize=2, iteration_length=4, shape=(), side=<Side.THRU: 3>),) of unknown type <class 'tuple'>

mpharrigan commented 11 months ago

@tanujkhattar is out for a while and would be the best person to investigate this

mpharrigan commented 11 months ago

Each Bloq has class attributes. StatePreparationAliasSampling has an attribute:

    selection_registers: Tuple[SelectionRegister, ...] = attrs.field(
        converter=lambda v: (v,) if isinstance(v, SelectionRegister) else tuple(v)
    )

which isn't actually documented (@tanujkhattar please address when you're back); but it presumably lets you re-configure the Registers used for selection. Crucially, this/these (?) registers contain the iteration_length that gives the range of indices that will appear in the selection register.

It doesn't look like this feature is used within the qualtran codebase: usually the from_lcu_probs factory method is used instead which uses one register named "selection".

Meanwhile: the serialization code only handles certain types when used as class attributes: https://github.com/quantumlib/Qualtran/blob/main/qualtran/protos/args.proto#L37

We'd need to add support for class attributes that are Register (SelectionRegister) types.

How urgent is this? In the meantime: you could patch it ~sortof like

diff --git a/qualtran/bloqs/state_preparation.py b/qualtran/bloqs/state_preparation.py
index 718095af..016fa6b0 100644
--- a/qualtran/bloqs/state_preparation.py
+++ b/qualtran/bloqs/state_preparation.py
@@ -84,9 +84,8 @@ class StatePreparationAliasSampling(PrepareOracle):
         (https://arxiv.org/abs/1805.03662).
         Babbush et. al. (2018). Section III.D. and Figure 11.
     """
-    selection_registers: Tuple[SelectionRegister, ...] = attrs.field(
-        converter=lambda v: (v,) if isinstance(v, SelectionRegister) else tuple(v)
-    )
+    sel_bitsize: int
+    sel_range: int
     alt: NDArray[np.int_]
     keep: NDArray[np.int_]
     mu: int
@@ -109,12 +108,21 @@ class StatePreparationAliasSampling(PrepareOracle):
         )
         N = len(lcu_probabilities)
         return StatePreparationAliasSampling(
-            selection_registers=SelectionRegister('selection', (N - 1).bit_length(), N),
+            sel_bitsize=(N - 1).bit_length(),
+            sel_range=N,
             alt=np.array(alt),
             keep=np.array(keep),
             mu=mu,
         )

+    @property
+    def selection_registers(self) -> Tuple[SelectionRegister, ...]:
+        return (
+            SelectionRegister(
+                'selection', bitsize=self.sel_bitsize, iteration_length=self.sel_range
+            ),
+        )
+
     @cached_property
     def sigma_mu_bitsize(self) -> int:
         return self.mu
@@ -158,7 +166,7 @@ class StatePreparationAliasSampling(PrepareOracle):
     ) -> cirq.OP_TREE:
         selection, less_than_equal = quregs['selection'], quregs['less_than_equal']
         sigma_mu, alt, keep = quregs.get('sigma_mu', ()), quregs['alt'], quregs.get('keep', ())
-        N = self.selection_registers[0].iteration_length
+        N = self.sel_range
         yield PrepareUniformSuperposition(N).on(*selection)
         yield cirq.H.on_each(*sigma_mu)
         qrom_gate = QROM(

mstechly commented 11 months ago

Thank you! I'll test these patches and see if I run into any other issues!

mstechly commented 11 months ago

Unfortunately with the changes you suggested I got the following:

File ~/.../Qualtran/qualtran/serialization/args.py:49, in arg_to_proto(name, val)
     47 if isinstance(val, cirq.Gate):
     48     return args_pb2.BloqArg(name=name, cirq_json_gzip=cirq.to_json_gzip(val))
---> 49 raise ValueError(f"Cannot serialize {val} of unknown type {type(val)}")

ValueError: Cannot serialize () of unknown type <class 'tuple'>

So it turns out that arg_to_proto doesn't handle two types of data which appear in this Bloq:

empty tuple: ()
a list of numpy arrays [array([2, 2, 3, 3]), array([5, 4, 7, 0])]

So I added this extremely unsafe and hacky logic:

    if isinstance(val, tuple):
        return args_pb2.BloqArg(name=name, ndarray=_ndarray_to_proto(np.ndarray(val)))
    if isinstance(val, list) and len(val) != 0 and isinstance(val[0], np.ndarray):
        return args_pb2.BloqArg(name=name, ndarray=_ndarray_to_proto(np.stack(val)))

The first one is wrong because apparently np.ndarray(()) creates an array with only 0 in it – I guess some version of a protobuf null would be more appropriate here. The second one might be giving correct results, this is what I get when I print out the protobuf object:

  bloq {
    name: "QROM"
    args {
      name: "data"
      ndarray {
        shape: 2
        shape: 4
        dtype: "np.dtype(\'int64\')"
        data: "\002\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\007\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
      }
    }

Also, when I tried to load it back with bloqs_from_proto(my_proto_obj) I got: ValueError: Unable to find a Bloq corresponding to bloq_proto.bloq.name='StatePreparationAliasSampling'

So I added the following entries to the RESOLVER_DICT in serialization/bloq.py:

    'StatePreparationAliasSampling': StatePreparationAliasSampling,
    'PrepareUniformSuperposition': PrepareUniformSuperposition,

and then this hit me:

  File "<attrs generated init qualtran.bloqs.prepare_uniform_superposition.PrepareUniformSuperposition>", line 4, in __init__
    _setattr('cvs', __attr_converter_cvs(cvs))
  File "/.../Qualtran/qualtran/bloqs/prepare_uniform_superposition.py", line 52, in <lambda>
    converter=lambda v: (v,) if isinstance(v, int) else tuple(v), default=()
TypeError: iteration over a 0-d array

So it looks like the fact that cvs has not been serialized properly hits me back. If you could help me out that would be great, as it seems that doing this serde properly would require some protobuf learning. I hope it's helpful :)

mstechly commented 11 months ago

Extra comment – the logic with RESOLVER_DICT in bloq_id_to_bloq seems a bit suspicious to me.

First, it will fail for any bloqs which are not on the list Second, I think it should be handling Alias Sampling just right cause it's a composite bloq. Actually, I thought it is, cause I can use decompose_bloq on it. But then I checked and it actually isn't. So I'm obviously wrong – so the logic works kind of makes sense, but I'm just letting you know it's a bit confusing for an outside user on how these things are structured.

So I wanted to ask – is the first thing by design? I can imagine you might want to restrict the set of available basic bloqs for deserialization. But on the other hand I can also imagine this being temporary solution that will be replaced by something more robust (e.g. autogenerated or user-provided RESOLVER_DICT) ?

mpharrigan commented 11 months ago

We synced offline, but capturing here.

Resolver dict is by design to avoid potentially executing arbitrary code. We do a similar thing in Cirq; but there we have a test to make sure that everything is added to the resolver dict
AliasSampling is a bloq. Anything with a 'name', (i.e. 'alias sampling') is a bloq. When you ask for its decomposition, you get a collection of bloqs that are all wired up. These are contained in CompositeBloq.

re: serialization: @tanujkhattar would be the best person to fix this properly but he is out for a bit. The original design of the serialization was supposed to be restrictive about what types of values you could include in bloq attributes. But we never actually tested that so there are now bloqs that include additional types of values in their attributes.

I like your hack; but instead of hacking things into ndarray; you could try json.dumps-ing the attributes to the string_val field in the BloqArg proto message. You'd have to deserialize it too. The following is my idea but untested

--- a/qualtran/serialization/args.py
+++ b/qualtran/serialization/args.py
@@ -11,6 +11,7 @@
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
+import json
 from typing import Any, Dict, Union

 import cirq
@@ -38,15 +39,13 @@ def arg_to_proto(*, name: str, val: Any) -> args_pb2.BloqArg:
         return args_pb2.BloqArg(name=name, int_val=val)
     if isinstance(val, float):
         return args_pb2.BloqArg(name=name, float_val=val)
-    if isinstance(val, str):
-        return args_pb2.BloqArg(name=name, string_val=val)
     if isinstance(val, sympy.Expr):
         return args_pb2.BloqArg(name=name, sympy_expr=str(val))
     if isinstance(val, np.ndarray):
         return args_pb2.BloqArg(name=name, ndarray=_ndarray_to_proto(val))
     if isinstance(val, cirq.Gate):
         return args_pb2.BloqArg(name=name, cirq_json_gzip=cirq.to_json_gzip(val))
-    raise ValueError(f"Cannot serialize {val} of unknown type {type(val)}")
+    return args_pb2.BloqArg(name=name, string_val=json.dumps(val))

 def arg_from_proto(arg: args_pb2.BloqArg) -> Dict[str, Any]:
@@ -55,7 +54,7 @@ def arg_from_proto(arg: args_pb2.BloqArg) -> Dict[str, Any]:
     if arg.HasField("float_val"):
         return {arg.name: arg.float_val}
     if arg.HasField("string_val"):
-        return {arg.name: arg.string_val}
+        return {arg.name: json.loads(arg.string_val)}
     if arg.HasField("sympy_expr"):
         return {arg.name: parse_expr(arg.sympy_expr)}
     if arg.HasField("ndarray"):

mstechly commented 11 months ago

This approach immediately throws: *** TypeError: Object of type ndarray is not JSON serializable .

Ok, so I made the following changes to make it work:

I added custom np json decoder (source: https://pynative.com/python-serialize-numpy-ndarray-into-json/) to deal with np arrays, so the line of interest looks like this: return args_pb2.BloqArg(name=name, string_val=json.dumps(val, cls=NumpyArrayEncoder))
There were still some issues, so in the end instead of passing coeffs (input to StatePreparationAliasSampling.from_lcu_probs) as np.array I just passed them as a list.
In QROM the __attrs_post_init__ complained a bit, so I fixed it by casting d to numpy array: shapes = [np.array(d).shape for d in self.data] And removed two last assertions for checking if self.selection_bitsizes and self.target_bitsizes are tuples, as after deserialization they ended up being lists.

With all those changes when I do:

proto_stuff = bloqs_to_proto(state_prep)
reconstructed = bloqs_from_proto(proto_stuff)

I get the following:

(Pdb) reconstructed
[StatePreparationAliasSampling(sel_bitsize=2, sel_range=4, alt=array([2, 2, 3, 3]), keep=array([5, 4, 7, 0]), mu=3), PrepareUniformSuperposition(n=4, cvs=()), Split(n=3), CirqGateAsBloq(gate=cirq.H), QROM(data=[[2, 2, 3, 3], [5, 4, 7, 0]], selection_bitsizes=[2], target_bitsizes=[2, 3], num_controls=0), Join(n=3), LessThanEqual(x_bitsize=3, y_bitsize=3), CSwap(bitsize=2)]
(Pdb) state_prep
StatePreparationAliasSampling(sel_bitsize=2, sel_range=4, alt=array([2, 2, 3, 3]), keep=array([5, 4, 7, 0]), mu=3)

Which is a bit surprising as I was expecting only one output, but I guess since: state_prep == reconstructed[0] yields True, this is fine. Just FYI as another minor unintuitive thing :)

So I can do my stuff and you have a list of minor issues to fix, so I think we can call it a success 🎉 !

mpharrigan commented 11 months ago

Props for powering through.

bloqs_to_proto will construct a BloqLibrary proto message which has multiple bloqs in it. This needs to be documented (https://github.com/quantumlib/Qualtran/issues/333) but when you ask to serialize StatePreparationAliasSampling it will serialize that bloq (ie its attributes/signature) and its decomposition (ie a DAG of subbloqs). The subbloqs (ie attributes/signature) will also be serialized in the BloqLibrary as the nodes (or node data, depending on how you think about it) in the decomposition DAG. There's a sneaky argument max_depth that controls how deep we go.

mstechly commented 11 months ago

Admittedly, I didn't bother to read the documentation of bloqs_to_proto 😅 But it makes sense 👌

mpharrigan commented 11 months ago

That's probably for the best as there currently isn't any 💀

tanujkhattar commented 9 months ago

Sorry @mstechly, the initial prototype for serialization was added a while ago and it hasn't kept up with all the new things we've added to Qualtran. I appreciate your patience for powering through the rough edges!

Right now, SelectionRegister serialization is not supported because SelectionRegister is supposed to be deprecated and removed soon after we implement the more general Quantum Data Types in Qualtran proposal.

Is full serialization support a priority for you to unblock ongoing work or was it a oneoff experiment? If it's not a priority, I'll hold off adding serialization support for SelectionRegister (and this implies all Unary Iteration derived bloqs would run into an error).

mstechly commented 9 months ago

Not a priority, thanks @tanujkhattar !

mpharrigan commented 2 months ago

Has this been fixed?

tanujkhattar commented 2 months ago

Yes, this is now fixed. Serialization of symbolic alias sampling is blocked on serializing the Shaped object; but that's independent of the bloq serialization overall. Non symbolic alias sampling bloqs serialize fine now and we have tests that verify this. I think this can be closed. We can track getting rid of the long list of "not yet serializable" blocks in a separate issue

quantumlib / Qualtran

Cannot serialize Alias Sampling #518