Using cond_rnn with different type of static data

ykocoglu commented 3 years ago

Hi philiperemy,

I could not figure out how to contact you so, I created this issue.

Thank you for sharing this code.

Before looking into the details I wanted to know if this would be suitable for my problem.

I noticed that you have conditions (one-hot encoded vectors) such as: [0 0 1] or [1 0 0], etc. Referring to the blog you originally built this code for, I understand that these condition vectors can be different cities that might have different weather conditions if I understand it correctly.

I have static and time series data and I want my static features to affect my predictions (forecast). Output is a sequence (not classification).

Time-series data example: 100, 90, 60, 40, 20, .... (single feature with multiple examples with varying sequence length) Static data: 7 different features for each example at time t=0. Example: [ 5, 100, 0.8, 0.5, 10, 3.65, 7] --> This is not a one-hot encoded vector such as [1 0 0], etc. and each number has an effect on how the time series progresses for each example.

Can this code be used for a problem like this or does the input have to be a one-hot encoded vector?

Any help is appreciated. If you are interested in my problem, and if this works out, I'm planning to write a paper using the data I have and I can include your name in the paper.

Thank you.

philipperemy commented 3 years ago

@ykocoglu Sorry for the late reply. Conditions need not be a one-hot encoded vector. I made an example here that might be useful for you: https://github.com/philipperemy/cond_rnn/blob/master/examples/dummy_stations_example.py. Let me know how your paper writing goes :)

ykocoglu commented 3 years ago

Thank you so much Phillipe for the example. It will help a lot. I just have one question on the code itself. I looked into the conditional_rnn class and noticed that you are using a dense layer to reshape the conditional (non-temporal) input before assigning it as the initial state. Is this dense layer trained as well (weights updated) or will the input pass through it once? Also, this might be a weird question but, if you have multiple time series (let’s say 10 of them) that are independent from each other (stateless-each series has their own conditions), would the dimension of hidden states be (hidden_units, 10) or (hidden_units, 1) Sorry, I’m kind of new to dealing with hidden states. Thank you again.

philipperemy commented 3 years ago

Yes, this dense layer is trained. The dimension of this matrix depends on the conditions and the hidden_units. It does not depend on how many time series you feed (which is given by the batch size). If your condition has 4 values and you have 20 hidden units, then the matrix has a shape of 4x20. It's really mapping conditions > initial states.

ykocoglu commented 3 years ago

Thank you for your answers again. It cleared majority of the confusion I had. One last question based on the answer provided: shouldn’t the conditions of each batch be used separately to create unique initial states for each batch (sample) in the case of stateless lstm? To be more clear: If I have 2 conditions [1,0,0,0] and [0,1,0,0] for 2 independent time series with dimension (2,10,1) (stateless), shouldn’t I have the dimension of init_states as (2,4,20) where each (4,20) is assigned as init_state between batches? The reason I tend to think of it this way is because each condition is unique to each time series in the case of stateless lstm right?

philipperemy commented 3 years ago

No, if you have 10,000 time series with only 2 conditions then you will still have (2, num_units). Because each time series will fall into one of the two conditions. For example, if cond1=male, cond2=female, you have 10,000 time series modeling the blood pressure, then each time series can be conditioned on the gender of the individual. We want to initialize each time series based on this extra parameter. But each time series where cond1=male will be initialized in the same way. Then of course the state will change based on the time series itself when the LSTM unrolls across the time dimension (in your example it's 10).

ykocoglu commented 3 years ago

I see. Thank you so much. In a way, this makes sense. The only thing that I still don’t quite understand is how does it figure out it’s male or female between batches (when it resets state) because you only have a single initial state (2,num_units) and both conditions are represented in the single init_state the way I understand it which it should reset it to the same init_state between bathes during training. Also I apologize for not making my problem and thought process more clear when I asked my question (I was mainly thinking on the problem I was working on): I will present a medical problem similar to my problem but, my problem is in oil and gas and it is a seq2seq forecasting problem (return state=true but, stateless) but, I don’t have conditions, rather I have some features that range between 0-inf. For example (3 features) for 2 time series: [5000, 20, 80] and [3000, 15, 100]. There can be any number between 0-inf for all 3 features (technically but, of course there should be a limited range) and this is no longer a condition like [1 0 0] or [0 1 0]. Imagine these features representing [drug dosage (in mg), age of patient, weight of patient (in kg)] where the drug is administered at the beginning of a medical treatment (time=0) and I’m trying to forecast how the patients’ heart rate will change over time due to the given drug dosage, their age, etc. There can be additional features such as smoker or not a smoker (this is conditional) added to make it more comprehensive. Using cond_rnn in this problem seems to work fine (training runs) but, I now have init_state dimensions as (num_features, num_units) and it’s the same between all the batches I believe and it seems to me that it lost its individuality. I really will appreciate your thoughts on this (whether it would be right to use cond_rnn for this type of problem or can it be modified to fit this problem). Thank you again and I hope I’m not bothering you. Happy new year.

ykocoglu commented 3 years ago

Ok I think the answer to my confusion lies in your previous answer:”But each time series where cond1=male will be initialized in the same way. Then of course the state will change based on the time series itself when the LSTM unrolls across the time dimension...”. I think I don’t understand how the init_state changes between each time series depending on the condition given if it by default has size (num_features, num_hidden units) and assuming it resets to the same init_state between each time series. In my mind, training is continuous with the same init_state between each time series but, is that a wrong assumption and does the cond_rnn layer actually change the init_state between each time series? I apologize for asking the same thing again, it’s just very hard to visualize for me. Thank you.

philipperemy commented 3 years ago

training is continuous with the same init_state between each time series

I'd say this is not correct. In a batch size of 50 time series, you have 50 initial states. If you change your batch size to 250, you will have 250 initial states. The network does not depend on the batch size. We have one initial state per time series and they N possible initial states where N is the size of your conditions (for example male/female N=2). In terms of shape your initial states are (batch_size, hidden_units). And I think that's what you're missing. The states have a batch dimension as well. And this state is stateless. After the call to fit() or train_on_batch() or predict_on_batch(), the states are flushed! Because you don't need them anymore you have your predictions for the whole sequences.

Without cond_rnn the initial states are usually initialized with a vector of zeros. So every time series is initialized the same way. To go back to the male/female example, if you consider a batch size=1 (for sake of simplicity), and 10,000-time series (5000 female and 5000 male). The way you do your training is:

You set the condition vectors: male = [0, 1] and female = [1, 0]. Only once before training!
You sample a time series from the 10K (because batch size = 1). Say it's a female.
You present this time series and the condition female=[1,0] to the cond_rnn.
What will happen is that the init state will initialize with the female condition.
The RNN is stateless meaning that once the call returns, you lose the state. During the call, the RNN will unroll on the whole time series you give as input. Here let's say you sample 50 time steps from a very long sequence. The RNN will unroll on the 50 steps and the initial states will be updated 50 times (internally). Then the weights will be updated with backpropation. The matrix to find the initial state will also be updated (to find a better initialization).
Then you have a better initialization than before (at least for female)
Then you sample a second time series and so on.
At the end you converge to a good set of weights to solve your problem.

But for every time series (from your 10K time series), you have only 2 conditions so you will always have 2 initial states.

At test time, you have to give the condition male/female because the CondRNN will expect a condition to know which initial state to choose.

ykocoglu commented 3 years ago

Thank you so much Philipperemy. It makes a lot of sense now. So, from how I understand it: if batch_size = 1, even if I give all the inputs for 10K time series with 10K conditions, in the stateless case, it picks the 1st sample (time series and condition(s)) and sets initial hidden state from that sample's condition(s) and it does the same thing for the next samples while sharing the learned weights between each sample and flushing the initial_hidden_state values from previous samples each time. All this is done in the background after the call to fit(), right? Sorry, for the questions, it was actually not easy to visualize but, your answer helped me visualize the process a lot better. Thank you so much. I'll keep you updated on the paper and would like to have your name on it. My email is mrkocoglu@yahoo.com. Send me an email please and let's keep communicating. I'll update you via email. One last thing I want to say is: You are awesome!

philipperemy commented 3 years ago

So, from how I understand it: if batch_size = 1, even if I give all the inputs for 10K time series with 10K conditions, in the stateless case, it picks the 1st sample (time series and condition(s)) and sets initial hidden state from that sample's condition(s) and it does the same thing for the next samples while sharing the learned weights between each sample and flushing the initial_hidden_state values from previous samples each time.

Yes, it's correct.

Thank you so much. I'll keep you updated on the paper and would like to have your name on it.

Great! My email is premy.enseirb@gmail.com. You can communicate with me with this email.

One last thing I want to say is: You are awesome!

Thank you so much :)

philipperemy / cond_rnn

Using cond_rnn with different type of static data #11