homomorfism / data-mining-project

1 stars 0 forks source link

EDA #5

Open DanisAlukaev opened 2 years ago

DanisAlukaev commented 2 years ago

For a further discussion, it makes sense to perform simple exploratory data analysis, so that we can get some insights from the data and perform necessary manipulation.

DanisAlukaev commented 2 years ago

Moved Maxim from #2 to this issue.

implausibleDeniability commented 2 years ago

EDA on IMDB dataset: d5759483cf60cae0d2c0f843fb6cf18995659b83

implausibleDeniability commented 2 years ago

I actually think it's more reasonable to make the process iterative.

  1. One guy prepares the main dataset of user-item interaction

2.1. Second guy prepares the model on this pure data (without features). I think this dataset on user-item interaction will be enough to give some problems for the second guys to solve (read the docs on the model, configure env, make simple training, make baseline to compare with, spend some time trying to understand why the model works worse than the baseline, etc) 2.2 In the mean time, the third guy prepares simple extra features to use with and advanced models

3.1 Somebody adapts the new extra features to the extended feature-incorporating model 3.2 In the mean time, another guy prepares advanced extra features

implausibleDeniability commented 2 years ago

From this perspective, "EDA" issue is too general. There will be different necessities for data exploration, and each of them will require separate issue =)

DanisAlukaev commented 2 years ago

Thanks for your comments. Agree with you. @homomorfism could you organise our work in such a way, please?

I actually think it's more reasonable to make the process iterative.

  1. One guy prepares the main dataset of user-item interaction

2.1. Second guy prepares the model on this pure data (without features). I think this dataset on user-item interaction will be enough to give some problems for the second guys to solve (read the docs on the model, configure env, make simple training, make baseline to compare with, spend some time trying to understand why the model works worse than the baseline, etc) 2.2 In the mean time, the third guy prepares simple extra features to use with and advanced models

3.1 Somebody adapts the new extra features to the extended feature-incorporating model 3.2 In the mean time, another guy prepares advanced extra features

homomorfism commented 2 years ago

That's fine, let's work on this methology: