jacobod / Facebook-Metrics-Prediction

Analysis and Modeling of the UCI Facebook Metrics Dataset
Other
12 stars 8 forks source link

Total Interactions #1

Open ladataanalytics opened 5 years ago

ladataanalytics commented 5 years ago

If the purpose is to build a predictive model for the target Lifetime Engaged Users I would argue that you cannot use Total Interactions as a feature. Technically, to become a Lifetime Engaged User someone must interact with the post. Unsurprisingly, the random forest results show that Total Interactions was the most important. I'm happy to hear your thoughts.

souhagaa commented 4 years ago

I agree with @ladataanalytics that is lookahead bias but I have another question here about the understanding of the problem shouldn't this be treated as a time series forecasting problem?

jacobod commented 2 years ago

Hi @ladataanalytics and @souhagaa thank you for reviewing the project and your questions.

In terms of Total Interactions leading to leakage, I would argue that it is possible, although since Total Interactions = # Likes + # Comments + # Shares, it would extend to those separate fields as well. Since the number of users who shared, liked, or commented is a subset of the Lifetime Engaged Users (as best I can understand it) there definitely is a high, positively correlated relationship. In short, I think if we are doing a backwards looking analysis for understanding when to post we have to control for those, but as a predictive tool I agree that it it could lead to leakage.

In terms of a time series problem, @souhagaa I think it depends how we define it. since there is only 1 row/post, we don't have the granularity to track a post's engagement throughout time, but I could see how we could forecast something like overall page likes, given the performance of the posts.