homomorfism / data-mining-project

1 stars 0 forks source link

Imdb processing #7

Closed implausibleDeniability closed 2 years ago

implausibleDeniability commented 2 years ago

6 completed

homomorfism commented 2 years ago

Cool EDA, minor fixes:

Suggestions:

Something like that:

Dataset: imdb Which data can be collected:

  • film data (title.rating.tsv and title.basics.tsv)
    • id of film
    • average rating and num votes
    • title
    • film is adult or not
    • year of launching and ending
    • length of film
    • multiple genres for film (there are many of them)
  • data about actors, producers, writer, composers of films (name.basics.tsv, title.crew.tsv and title.principals.tsv)
    • birth/death year
    • their profession
    • titles of films in which they are participated

Useless data: title.episode.tsv and title.akas.tsv No user ratings, posters of films, etc.

implausibleDeniability commented 2 years ago

Another suggestion from Shamil: add sys.path('..') in the notebook so it could be runned in eda/

homomorfism commented 2 years ago

Possibly it is better to rename eda/ directory to notebooks/

implausibleDeniability commented 2 years ago

Хотел просто снять с Ризвана таску по ревью, но случайно ещё раз пинганул Даниса, соре

implausibleDeniability commented 2 years ago

IMPORTANT: Don't merge until Rizvan merges his branch

implausibleDeniability commented 2 years ago

I just understood we're merging branches into dev not main