ragulpr / wtte-rnn

WTTE-RNN a framework for churn and time to event prediction
MIT License
762 stars 186 forks source link

How does wtte work seasonally? #30

Closed sn3fru closed 6 years ago

sn3fru commented 6 years ago

First of all, congratulations on the great work with wtte.

My question is about different periods in time. We know that the behavior is not constant, that in some months or during the dawn the flow is naturally lower if the learning in creating its curves of theoretical time series takes this behavior into account.

ragulpr commented 6 years ago

Thank you! And thanks for initiating interesting discussions here.

There's a few things to keep in mind when introducing what I call global features. One such example features is 'mean number of commits yesterday among the whole population' or 'temperature yesterday'. In short : go for it! But be careful.

  1. Does it help the algo figure out when censoring is likely? I.e overfitting censored datapoints by pushing distribution to infinity.
  2. Does it help in prediction or are we just learning historic artifacts? Do we overfit?

First thing is wtte/survival specific issue. Don't disclose to the algo things that helps it know if a timestep of a seq. is censored or not. The second one is a more general forecasting problem.

In my experience, adding this type of feature unintuitively decreased sequence-specific overfitting and hence only had moderate to no effect on the type-1 problem above. One explanation is that the algo dont need to focus big chunk of the network to inferring the global time from the features when it gets it for free.

Check out the feature mean_commits_global in data pipeline example.