ragulpr / wtte-rnn

WTTE-RNN a framework for churn and time to event prediction
MIT License
765 stars 187 forks source link

Combining data_pipeline and simple_example #7

Closed hedgy123 closed 7 years ago

hedgy123 commented 7 years ago

Hi Egil,

Thank you so much for making your code available! This is really great stuff.

So in trying to understand better how it all works I tried using the tensorflow.log-extracted data (as in your data_pipeline notebook) as inputs to the network (same config as in your simple_example). Unfortunately I got all nan's as losses:

Model summary:

 init_alpha:  -785.866918162
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_1 (GRU)                  (None, 101, 1)            18        
_________________________________________________________________
dense_1 (Dense)              (None, 101, 2)            4         
_________________________________________________________________
activation_1 (Activation)    (None, 101, 2)            0         
=================================================================
Total params: 22.0
Trainable params: 22.0
Non-trainable params: 0.0  

Results of running model.fit:

Train on 72 samples, validate on 24 samples
Epoch 1/75
2s - loss: nan - val_loss: nan
....

I was wondering if you've tried doing the same experiment and if so, whether it worked for you? Thanks so much!

ragulpr commented 7 years ago

Hi there, Thanks for reaching out! I need to be clearer about this, I haven't had time to join together the two scripts yet. I'll get back to you ASAP with an updated answer but for now: init_alpha: -785.866918162 is an error. (<0)

Note that for big magnitutes of alpha mean of tte is same as the complex estimate using log etc

Furthermore

OBS NOT TESTED:

def epoch():
    for i in xrange(n_samples):
        model.fit(x_train[i,:seq_length[i],:], y_train[i,:seq_length[i],:],
                  epochs=1,
                  batch_size=1,
                  verbose=2
                  )

But even better debug-mode initially is to simply transform the data to [n_non_masked_samples,1,n_features] (feed in only seen timesteps) to a simple ANN and when that works test the RNN.

Would love to see forks!

ragulpr commented 7 years ago

There's multiple reasons for NANs to show up but just found a very important:

shift_discrete_padded_features is currently broken which is supposed to hide target but apparently doesn't. This means that if input is "event" then it's possible to make a perfect prediction, causing exploding gradient

I'm trying to fix it asap

NataliaVConnolly commented 7 years ago

Hi Egil,

Thanks for the update! Here's a fork with the notebook Combined_data_pipeline_and_analysis in examples/keras.

  https://github.com/NataliaVConnolly/wtte-rnn-1

The last cell shows an example of training with just one input sequence. It does result in a non-NaN loss, although a very large one (but I didn't optimize the initial alpha or the network config much).

Cheers, Natalia (aka hedgy123 :))

ragulpr commented 7 years ago

@NataliaVConnolly Sorry for the wait. It took me some time to figure out what was wrong!

Check out the new data_pipeline and let me know if you have more questions! :)