Training data encoding and classifier

dbcq commented 2 years ago

Hi, I have a set of training data, and for each of those examples I want to "learn embeddings on the fly", such is done in some NLP models for example. I.e., have a bank of parameters for training example 1, 2, 3, etc., and when training example 1 comes up I want to use its parameters in the model. I then have a classifier circuit which will have the same trainable parameters for all examples:

Encoding (different for each example) --> classifier (same for all examples) ( --> readout)

Apologies for what might be a basic question (I'm coming from quantum rather than ML background), but I'm struggling to implement this. I've come across nn.embedding_lookup and resolve_parameters which look like they could do what I want -- select the appropriate encoding params for my example, and put then into the encoding circuit but I can't work out how to integrate this into a model. There's also the keras.layers.Embedding which does this sort of thing automatically classically, but it would be quite a job to make a quantum version of that with circuits rather than vectors... Any pointers appreciated!

lockwo commented 2 years ago

I have limited NLP experience, but I seem to have 2 possible understandings (both of which are conditioned on some input) 1) You have a collection of circuits that have different structures (and parameters obviously) that you then want to use 2) You have a collection of parameters for the same circuit structure that you want to use

These both seem very similar in spirit to a lot of the data reuploading structures. Your best bet would probably be to create a custom layer (although it may be possible to hack it together with expectation like in https://github.com/tensorflow/quantum/issues/672#issuecomment-1045877528). If you outline which interpretation is correct (or if they are both wrong, provide more detail), I can probably make a minimal example that might help you.

dbcq commented 2 years ago

Hi, thanks very much for the reply. I'll try and be more clear. I have an encoding circuit with a fixed structure, but where the parameters depend on the (classical) input -- in this case words in a sentence. I'll have a bank of parameters for the different words, so if the first word is "cat", then we look up the parameters for "cat" and put them into the first slot in the encoding circuit. If the second word is "sat" then we look up those parameters to go in slot 2, etc. The aim is to train these parameters as part of the task, in the same way that word embeddings are sometimes trained classically as part of a task. The encoding circuit then feeds into a PQC which has the same structure and parameters for all the inputs, and is trained to correctly classify the input.

I hope that's clearer? I will have another look through the data reuploading examples, I think I understood the examples on the tutorials page but it's specifically having a bank of input parameters which are then selected from according to the input that I'm having trouble implementing.

Thanks again!

lockwo commented 2 years ago

I think I see what you are getting at. It seems to bear certain similarities to reuploading, but it's definitely different. I think I have a minimal working demo below. I don't really do stuff with NLP, so I'm not sure what the pipeline is, but it should give an example of what you are trying to do from which you can build upon. There are probably other ways to implement this, but I went with a custom layer. If you understand the re-uploading implementations, this is pretty similar, I basically just combined the used the reuploading idea but with an embedding params variable that uses the tf.nn.embedding_lookup. So it has some embedding parameters that vary depending on the input (for constant structure embedding circuits) followed by a constant structure PQC with universal params.

It certainly seems to be learning the correct things for this simple example. Screen Shot 2022-03-15 at 6 34 37 PM

import tensorflow_quantum as tfq
import cirq 
import sympy
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

class Example(tf.keras.layers.Layer):
    def __init__(self, num_q, lays, vocab_size) -> None:
        super(Example, self).__init__()
        self.qubits = [cirq.GridQubit(0, i) for i in range(num_q)]
        self.embedding_output_dim = len(self.qubits)
        self.num_params = 2 * len(self.qubits) * lays
        self.embeddings = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (vocab_size, self.embedding_output_dim)), dtype="float32", trainable=True)  
        self.pqc_params = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (1, self.num_params)), dtype="float32", trainable=True)
        self.total_params = self.num_params + self.embedding_output_dim
        self.params = sympy.symbols("params0:%d"%self.total_params)
        self.readout_ops = [cirq.Z(self.qubits[0])]
        self.model = tfq.layers.ControlledPQC(self.make_circuit(lays, self.params), self.readout_ops, differentiator=tfq.differentiators.Adjoint())
        self.in_circuit = tfq.convert_to_tensor([cirq.Circuit()])

    def make_circuit(self, lays, params):
        cir = cirq.Circuit()
        cir += self.embedding_circuit(params[:self.embedding_output_dim])

        params_per_layer = 2 * len(self.qubits)
        for i in range(lays):
            cir += self.u_ent(params[i * params_per_layer + self.embedding_output_dim:(i + 1) * params_per_layer + self.embedding_output_dim])

        return cir

    def embedding_circuit(self, ps):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
            c += cirq.ry(ps[i]).on(self.qubits[i])
        for i in range(len(self.qubits) - 1):
            c += cirq.CNOT(self.qubits[i], self.qubits[i + 1])
        return c

    def u_ent(self, ps):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
            c += cirq.rz(ps[i]).on(self.qubits[i])
        for i in range(len(self.qubits)):
            c += cirq.ry(ps[i + len(self.qubits)]).on(self.qubits[i])
        for i in range(len(self.qubits) - 1):
            c += cirq.CNOT(self.qubits[i], self.qubits[i+1])
        c += cirq.CNOT(self.qubits[-1], self.qubits[0])
        return c

    # inputs = (batch, in_size)
    def call(self, inputs):
        num_batch = tf.gather(tf.shape(inputs), 0)
        # (1, 1) -> (batch, 1)
        input_circuits = tf.repeat(self.in_circuit, repeats=num_batch)
        # -> (batch, 1, embed)
        input_params = tf.nn.embedding_lookup(self.embeddings, inputs)
        # (batch, 1, embed) - > (batch, embed)
        input_params = tf.squeeze(input_params, axis=1)
        # (1, num_param) -> (batch, num_params)
        pqc_params = tf.tile(self.pqc_params, [num_batch, 1])
        # (batch, num_params), (batch, num_params) -> (batch, total_params)
        full_params = tf.concat([input_params, pqc_params], axis=1)
        # -> (batch, n_qubit)
        output = self.model([input_circuits, full_params])
        return (output + 1)/2

import random 

vocab = ["happy", "smile", "sad", "frown"]
v_size = len(vocab)

inputs = tf.keras.layers.Input(shape=(1,), dtype='int32')
outs = Example(3, 10, v_size)(inputs)
vqc = tf.keras.models.Model(inputs=inputs, outputs=outs)
vqc.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(lr=0.01))

d_size = 100
# In number format, I assume TF has methods for text -> numbers
data = tf.convert_to_tensor([[random.randint(0, v_size-1)] for _ in range(d_size)])
#labels = tf.convert_to_tensor([0 if 2 in data_point else 1 for data_point in data])
labels = tf.convert_to_tensor([0 if i > 1 else 1 for i in data])

X_train = data[:90]
y_train = labels[:90]
X_test = data[90:]
y_test = labels[90:]

reup_hist = vqc.fit(X_train, y_train, epochs=20, batch_size=10, validation_data=(X_test, y_test))

print(vqc(X_test), y_test)

plt.plot(reup_hist.history['loss'], label='Train Loss')
plt.plot(reup_hist.history['val_loss'], label='Val Loss')
plt.legend()
plt.xlabel("Iteration")
plt.ylabel("Loss")
plt.show()

dbcq commented 2 years ago

That's brilliant, thanks so much for your help! That's just what I was having trouble with but it all seems quite straightforward when it's done right. One quick question: what's the purpose of return (output + 1)/2 at the end of call()?

lockwo commented 2 years ago

That's just to match the output range to the label range. The labels are [0, 1] but the Z expectation is -1 to 1 so I just shifted it to 0 to 1. I don't know if it's necessary, just something I usually do.

lockwo commented 2 years ago

Any updates on this or should it be closed?

tensorflow / quantum

Training data encoding and classifier #675