ragulpr / wtte-rnn

WTTE-RNN a framework for churn and time to event prediction
MIT License
762 stars 186 forks source link

Exogeneity and grouping #31

Closed sn3fru closed 6 years ago

sn3fru commented 6 years ago

Are all ids treated individually? Producing alphas and betas for each of the ids. If a training result of a specific id does not help train any second id, would it be better for me to group ids with similar behaviors into some macro-group to increase the mass of training data for that group (such as a hierarchical regression)? ?

ragulpr commented 6 years ago

Yes, all ID's refers to their own sequence.

I think this is an interesting but very general machine-learning question applicable to any modeling situation. So I'll give you my 2 highly subjective cents from that perspective. (Let me know if I'm missing the question)

I think it's fruitful to think of hierarchical regression as a special case of a neural network with a closed form solution given some assumptions. If you train one neural network for each group you'd literally have HR. I assume it could be effective if there's no shared information among groups - but there always is (eg. each networks needs to learn how to count down etc)

I don't like it as there's always common patterns among groups and it sounds messy. The nice things with Neural Networks is that you can typically just feed categorical variables and it'll learn quite effectively the group-specific patterns. The nice thing with RNNs is that this holds in the temporal dimension too.

Increase network size & depth, explicitly tell your network (via categorical variables) about group but would be interesting to hear if you get different results by some hierarchical training scheme!