Combining data_pipeline and simple_example

hedgy123 commented 7 years ago

Hi Egil,

Thank you so much for making your code available! This is really great stuff.

So in trying to understand better how it all works I tried using the tensorflow.log-extracted data (as in your data_pipeline notebook) as inputs to the network (same config as in your simple_example). Unfortunately I got all nan's as losses:

Model summary:

 init_alpha:  -785.866918162
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_1 (GRU)                  (None, 101, 1)            18        
_________________________________________________________________
dense_1 (Dense)              (None, 101, 2)            4         
_________________________________________________________________
activation_1 (Activation)    (None, 101, 2)            0         
=================================================================
Total params: 22.0
Trainable params: 22.0
Non-trainable params: 0.0

Results of running model.fit:

Train on 72 samples, validate on 24 samples
Epoch 1/75
2s - loss: nan - val_loss: nan
....

I was wondering if you've tried doing the same experiment and if so, whether it worked for you? Thanks so much!

ragulpr commented 7 years ago

Hi there, Thanks for reaching out! I need to be clearer about this, I haven't had time to join together the two scripts yet. I'll get back to you ASAP with an updated answer but for now: init_alpha: -785.866918162 is an error. (<0)

Note that for big magnitutes of alpha mean of tte is same as the complex estimate using log etc

Furthermore

Initialization is important. Gradients explode if you're too far off. More censored data leads to higher probability of exploding grad initially.
Learning rate is dependent on data and can be in magnitudes you didn't expect
Are you feeding in masked steps? Varying length sequences has no clean implementation atm, haven't had time to get masking layer to work. Current solution: set n_timesteps = None and run training step with one input sequence with something like:

OBS NOT TESTED:

def epoch():
    for i in xrange(n_samples):
        model.fit(x_train[i,:seq_length[i],:], y_train[i,:seq_length[i],:],
                  epochs=1,
                  batch_size=1,
                  verbose=2
                  )

But even better debug-mode initially is to simply transform the data to [n_non_masked_samples,1,n_features] (feed in only seen timesteps) to a simple ANN and when that works test the RNN.

Would love to see forks!

ragulpr commented 7 years ago

There's multiple reasons for NANs to show up but just found a very important:

shift_discrete_padded_features is currently broken which is supposed to hide target but apparently doesn't. This means that if input is "event" then it's possible to make a perfect prediction, causing exploding gradient

I'm trying to fix it asap

NataliaVConnolly commented 7 years ago

Hi Egil,

Thanks for the update! Here's a fork with the notebook Combined_data_pipeline_and_analysis in examples/keras.

  https://github.com/NataliaVConnolly/wtte-rnn-1

The last cell shows an example of training with just one input sequence. It does result in a non-NaN loss, although a very large one (but I didn't optimize the initial alpha or the network config much).

Cheers, Natalia (aka hedgy123 :))

ragulpr commented 7 years ago

@NataliaVConnolly Sorry for the wait. It took me some time to figure out what was wrong!

Too much censoring leads to instability. Works when using more frequent committers, <50% censoring. In the example I use only those who committed at least 10 days.
You train on one subject but initialize alpha using the mean over all subjects. This leads to high probability of exploding gradient.
As mentioned above, if it was done before the fix of shift_discrete_padded_features that would also lead to NaN (perfect fit) after some training.

Check out the new data_pipeline and let me know if you have more questions! :)

ragulpr / wtte-rnn

Combining data_pipeline and simple_example #7