philipperemy / cond_rnn

Conditional RNNs for Tensorflow / Keras.
MIT License
225 stars 32 forks source link

Question on conditions #24

Closed Zatfer17 closed 3 years ago

Zatfer17 commented 3 years ago

Hi, I came across this repo through Stack Overflow and really liked the idea! I'm trying to use what you did in order to predict the goals of players in a football league but I'm a bit confused on how I should structure the inputs.

Let's suppose I'm just focusing on strikers, on a league of 20 teams, around 3 strikers per team. Doing a parallelism with your example the stations would be the strikers, 20x3=60 in total. The number of timesteps would be (20-1)x2=38. The variables I have are:

I would use the first two as continuous variables and the remaining as conditions, but I'm not really confident about this choice. Also, I don't get why I have num_stations conditions and not just one. Probably because I need one condition for each prediction, so num_stations conditions, but wanted to be sure about this as well.

Matteo

philipperemy commented 3 years ago

Hey,

A LSTM takes as input a normal tensor of shape (batch_size, timesteps, input_dim). CondRNN also takes this same tensor as input but it also takes another tensor, or list of tensors: the conditions (that I explain below). The conditions DO NOT depend on the time dimension.

You have 60 players. That means 60 time series, each one containing N time steps. That gives you a matrix of shape (60, N, 1). That's what you would usually feed to an RNN.

Then for the conditions, you can choose:

You cannot choose conditions that depend on the time step. Conditions are used to initialize the state of the RNN before unrolling across the time dimension.

If they depend on the time steps, then they should be pushed to the usual input tensor (60, N, 1) that I defined above. That would be the case for: Opposing team for the game at timestep t. If you want to still use the opposing team, then it will translate to a tensor of size 20 (20 teams) with one-hot encoding. Then you need to concatenate it to your tensor (60, N, 1) to have (60,N,21). That means that at every time step, the RNN will know the opposing team. But then make sure they have the same time unit. If one is in minute and the other is in number of games, you have to be a bit careful about how you will concatenate.

Zatfer17 commented 3 years ago

Hi, thank you so much! I'll immediately try, Matteo

philipperemy commented 3 years ago

Great, let me know!

adaj commented 3 years ago

@philipperemy Thank you for the great package and good tips. I'm wondering how that would fit NLP tasks.

Consider for example a text classification task in which the previous class/labels might help to predict the next. For text features, assume word embeddings. Since you mentioned that conditions should not depend on time, we would have the previous labels as the inputs and the embeddings of the current text as the condition.

Thus, the input shape should be (sample_size, timesteps, n_labels), with n_labels > 1 if there is more than one class (i.e., multiclass). The conditions would be (sample_size, embedding_dim). For example, if my training data are 500 sentences with 4 binary labels, and I want my model to look back at 5 sentences, the input shape for my LSTM will be (500, 5, 4), and conditions (500, embedding_dim).

Is my reasoning correct in order to use effectively your conditional architecture?

philipperemy commented 3 years ago

@adaj yeah it's correct and it should work if you have an NLP dataset where the label t depends on the information from 0 up to time t. An example is a sentiment dataset where the sentiment of the sentence can evolve at any time:

I like this movie. I think it's a great one. But, after seeing the end, I was incredibly disappointed. ------------- Positive -----------------| -----------------------Negative -----------------------

That's an oversimplified example here but you have a matrix of shape (sample_size, timesteps, n_labels). There's some persistence e.g. P(positive|positive_before) > random so a LSTM should help here to predict a sentiment score (between 0 and 1). After the but, we should expect the score to jump. For the embedding, you can fetch pre-trained embedding from any large NLP model and use them as your conditions. Remember that by doing that the embedding will not be under the gradient loop. They will not be updated and will be treated as constants.