Open DanisAlukaev opened 2 years ago
Moved Maxim from #2 to this issue.
EDA on IMDB dataset: d5759483cf60cae0d2c0f843fb6cf18995659b83
I actually think it's more reasonable to make the process iterative.
2.1. Second guy prepares the model on this pure data (without features). I think this dataset on user-item interaction will be enough to give some problems for the second guys to solve (read the docs on the model, configure env, make simple training, make baseline to compare with, spend some time trying to understand why the model works worse than the baseline, etc) 2.2 In the mean time, the third guy prepares simple extra features to use with and advanced models
3.1 Somebody adapts the new extra features to the extended feature-incorporating model 3.2 In the mean time, another guy prepares advanced extra features
From this perspective, "EDA" issue is too general. There will be different necessities for data exploration, and each of them will require separate issue =)
Thanks for your comments. Agree with you. @homomorfism could you organise our work in such a way, please?
I actually think it's more reasonable to make the process iterative.
- One guy prepares the main dataset of user-item interaction
2.1. Second guy prepares the model on this pure data (without features). I think this dataset on user-item interaction will be enough to give some problems for the second guys to solve (read the docs on the model, configure env, make simple training, make baseline to compare with, spend some time trying to understand why the model works worse than the baseline, etc) 2.2 In the mean time, the third guy prepares simple extra features to use with and advanced models
3.1 Somebody adapts the new extra features to the extended feature-incorporating model 3.2 In the mean time, another guy prepares advanced extra features
That's fine, let's work on this methology:
For a further discussion, it makes sense to perform simple exploratory data analysis, so that we can get some insights from the data and perform necessary manipulation.