philipperemy / cond_rnn

Conditional RNNs for Tensorflow / Keras.
MIT License
225 stars 32 forks source link

CondLSTM with Embedding layer #40

Closed dubovikmaster closed 1 year ago

dubovikmaster commented 1 year ago

Hello and first of all thank you very much for your work! I want to use injection layer for categorical features before using CondLSTM like below and get error

forward_layer = ConditionalRecurrent(LSTM(units=256, return_sequences=True))
backward_layer = ConditionalRecurrent(LSTM(units=256, return_sequences=True, go_backwards=True))

i1 = Input(shape=(24, 14))
ic_1 = Input(shape=(4,))
norm = normalizer(i1)
v = vectorize_layer(ic_1)
embeding = Embedding(49, 4, input_length=4)(v)
inputs = (norm, embeding)
x = Bidirectional(layer=forward_layer,
                  backward_layer=backward_layer)(inputs)
x = Flatten()(x)
x = Dropout(.25)(x)
output = Dense(units=4, activation='linear')(x)
model = keras.Model([i1, ic_1], output)

in user code:

File "/usr/local/lib/python3.8/dist-packages/cond_rnn/cond_rnn.py", line 86, in call  *
    cond = self._standardize_condition(cond[0])
File "/usr/local/lib/python3.8/dist-packages/cond_rnn/cond_rnn.py", line 54, in _standardize_condition  *
    raise Exception('Initial cond should have shape: [2, batch_size, hidden_size] '

Exception: ('Initial cond should have shape: [2, batch_size, hidden_size] or [batch_size, hidden_size]. Shapes do not match.', TensorShape([None, 4, 4]))

Call arguments received by layer 'forward_conditional_recurrent_10' (type ConditionalRecurrent): • inputs=('tf.Tensor(shape=(None, 24, 14), dtype=float32)', 'tf.Tensor(shape=(None, 4, 4), dtype=float32)') • training=None • kwargs=<class 'inspect._empty'>

philipperemy commented 1 year ago

@dubovikmaster thanks for your feedback :)

Can you share a full snippet that I can run?

dubovikmaster commented 1 year ago

That is all) You can create random features and random conditional features and run them. I think that the problem in the shape which embedding layer returned

I have another question about your implementation of TCN Can I use cond-rnn with TCN?

philipperemy commented 1 year ago

@dubovikmaster okay I'll look into it. For your answer, I don't think it's possible since a Temporal Convolutional Network is a stack of causal CNN and there're no hidden states. The purpose of a Conditional RNN is to learn a representation of the first H0 from some external variables. TCN does not have such a H0.

dubovikmaster commented 1 year ago

And how can I use TCN with some conditional features?

philipperemy commented 1 year ago

The best is to add your conditional features in the input_dim dimension, the third dimension.

dubovikmaster commented 1 year ago

The best is to add your conditional features in the input_dim dimension, the third dimension.

Like one-hot vector? I don’t understand. Can you explain please?

dubovikmaster commented 1 year ago

for example, my numerical features have shapes (batch_size, 24, 15) and I have conditional features that are vectors with shapes (batch_size, 1,5). For example for each sample cond vector is ['a', 'b', 'c', 'd'] How can I use it with TCN?

philipperemy commented 1 year ago

Tile your cond features to (batch_size, 24,5). By tiling I mean duplicate them across the time dimension.

You now have

Concatenate them on the input_dim dimension.

You now have

(batch_size, 24, 20)

This is something you can input to a TCN.

ChrisDelClea commented 1 year ago

Hi @philipperemy ,

I enjoyed reading this issue as I am work myself on a similar task.

The question i had was if I can combine your ConditionalRNN layer with an upstream embedding layer and if it makes sense at all.

In my case it is a bit more concrete: I have customers who can shop four products each.
I also have relatively static information of the customers:

My time series data is the customers' purchases history over four product categories . My goal is to use the upstream embedding layer to train a general embedding of the customers.

What I wonder now is: actually a customer is also defined by his purchases history and not only his upstream embedding. Besides you wrote: "The purpose of a Conditional RNN is to learn a representation of the first H0 from some external variables."Would that actually be the better embedding?

Would it be possible to get h0 out of the ConditionalRNN layer? How would I have to define the "conditons" in general? E.g. place of residence, gender are categorial while e.g. age is numerical (but also changes every year).

philipperemy commented 1 year ago

@ChrisDelClea I guess you want to learn an embedding for a customer based on:

However, in practice I don't think it's possible unless you merge all those features into one single tensor. It's something relevant too and if you do that you don't need a ConditionalRNN. Just a regular RNN is OK.

The embedding layer in Keras expects only ONE tensor. That's how Keras was made.

So I'm guessing you can only learn the embedding of a customer from its purchases history (tensor.shape = <num_customers, time_axis, item_axis>).

From there, you can use a ConditionalRNN on top of this Embedding layer and feed the external variables (gender, location, etc) as one hot vectors.

But as you might guess the conditioning will only work from the RNN layer and not from the Embedding layer.

ChrisDelClea commented 1 year ago

Hey @philipperemy , yes thats right, i want time series with the customer embedding as a side effect. If i unterstand you right, you would feed the customer static data + purchases history as a single vector to an RNN, right? Can you give me a suggestion how the model architecture would look like?

So I'm guessing you can only learn the embedding of a customer from its purchases history (tensor.shape = <num_customers, time_axis, item_axis>). What do you mean by that?

philipperemy commented 1 year ago

@ChrisDelClea that means as far as I know, you will only be able to learn an embedding (with the Embedding Layer) of the customer's purchase history. The embedding cannot be fed other data like gender and location.

The conditioning (gender, location) will be done in the RNN at a later stage in the network.

In the end your model "will see all the info" but if you want to extract the embeddings of the Embedding layers to perform some sort of visualisation, it will only contain the purchase history.

-> feed this                                         -> feed this
[purchase history]                               [gender, location]
Embedding                       ---->                    RNN