Open karllhughes opened 5 years ago
Another data source with lots of movies and ratings in it: https://grouplens.org/datasets/movielens/latest/
A list of recommendation systems (including open source ones, some in Python) we could use to help this project along: https://github.com/grahamjenson/list_of_recommender_systems/blob/master/README.md
⭐️ A complete walkthrough for building a movie recommendation system in Python: https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/
How recommendation systems work: https://blog.statsbot.co/recommendation-system-algorithms-ba67f39ac9a3
We decided to use IMDB's data as this is a non-commercial project anyway. Here's how it'll work:
title.akas.tsv
, title.basics.tsv
, and title.crew.tsv
and flattens and saves the data to the database. (#59)Fuzzy matching SQL Query:
SELECT id, title, year, levenshtein(lower(title), lower('halloween')) as similarity
FROM "moviesImporter_movie"
WHERE levenshtein(lower(title), lower('halloween')) < (length('halloween') / 4)
order by levenshtein(lower(title), lower('halloween')) asc, year desc;
The last step is making a recommendation algorithm. Looks like Surprise might be helpful: http://surpriselib.com/
Using the OMDB API, we could get features from movies in a user's list and then search for other movies that might have those same features.
Some features might include:
All these fields are available in the API, but one problem is that their API only allows searching by title and movie year.
So, if we want to search by these fields, we'll need to index a significant portion of their results in our own database, or we could explore using another data set, like this one on Kaggle: https://www.kaggle.com/tmdb/tmdb-movie-metadata
There's also another API we could try: https://developers.themoviedb.org/3/getting-started/introduction But I don't know if they offer any better searching/filtering either.