ragulpr / wtte-rnn

WTTE-RNN a framework for churn and time to event prediction
MIT License
762 stars 186 forks source link

Masking layer doesn't work #50

Open FrancescoFrontino opened 5 years ago

FrancescoFrontino commented 5 years ago

I encountered some problems using the masking layer. The network, instead of skipping the padded timestamps, computes the gradients obtaining nan values. More in detail, I have padded the sequences with the value -1.0 using the pad_sequences function implemented in keras. Then, I have trained the model using the train_on_batch method.

Do you already face these kinds of problems?

Can be this explanation a reason for such problems? "If any downstream layer does not support masking yet receives such an input mask, an exception will be raised." -- keras documentation

ragulpr commented 5 years ago

Hi thanks for comment! Do you have a reproducible example? I've never used pad_sequences myself.

In any case (when it's working) mask layer will multiply loss function by 0/1 mask if all above layer propagates the mask. So if any of the outputs is NaN then endresult would be NaN after summation