havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
803 stars 188 forks source link

How to deal with data about clicks #101

Open RomanGarayev opened 3 years ago

RomanGarayev commented 3 years ago

Hi guys, I have got the data about subscribers and their clicks on add from emails. Obviously, I have got time of click only in case of a subscriber followed the link that was in mail. In contrast, for unfollowed users this columns is just 'NaN'. So, can u give me the advise: what type of censoring I should use and to what time? Thank you

havakv commented 3 years ago

I'm not really sure how to approach this, or if it is best approached by survival analysis at all. But I guess you could censor individuals at the end time of you dataset. So the censoring time would be the difference between when they got the email and when you don't have any more information about them?

You could maybe provide a more detailed explanation of your project if you would hope for more ideas from anyone else?

RomanGarayev commented 2 years ago

My project is to develop a model that will perform SMS ad-campgain send time optimization. I have got data about SMS-send time and SMS open time, gender, and a lot of binary features that provide information about user speciality. My model has to choose the best send time of sms for certain user. Survival analysis is so-called time to event analysis, that's why I decided to use this technique. Have you got any recommendations for me?

havakv commented 2 years ago

Hmm, I'm still not sure. As I said before, you can censor your observations if there has been a sufficient time from receiving the SMS and the user has still not clicked. However, if you're only interested in whether or not the user clicks and not how long it takes them to click, you should maybe approach this as a classification problem instead?

RomanGarayev commented 2 years ago

Hmm, I'm still not sure. As I said before, you can censor your observations if there has been a sufficient time from receiving the SMS and the user has still not clicked. However, if you're only interested in whether or not the user clicks and not how long it takes them to click, you should maybe approach this as a classification problem instead?

I have to predict the exact send-time of sms, not just the click or not. I understand that in case of "here has been a sufficient time from receiving the SMS and the user has still not clicked" I have to censor that user, but to what border? to today? or yesterday? and why? There is the most complicated question for me so far(