karllhughes / recs-by-text

Recommend movies to your friends and ask them for recommendations all through a convenient SMS interface.
https://recsbytext.com
1 stars 2 forks source link

Make recommendations to users automatically based on their list #55

Open karllhughes opened 5 years ago

karllhughes commented 5 years ago

Using the OMDB API, we could get features from movies in a user's list and then search for other movies that might have those same features.

Some features might include:

All these fields are available in the API, but one problem is that their API only allows searching by title and movie year.

So, if we want to search by these fields, we'll need to index a significant portion of their results in our own database, or we could explore using another data set, like this one on Kaggle: https://www.kaggle.com/tmdb/tmdb-movie-metadata

There's also another API we could try: https://developers.themoviedb.org/3/getting-started/introduction But I don't know if they offer any better searching/filtering either.

karllhughes commented 5 years ago

Another data source with lots of movies and ratings in it: https://grouplens.org/datasets/movielens/latest/

A list of recommendation systems (including open source ones, some in Python) we could use to help this project along: https://github.com/grahamjenson/list_of_recommender_systems/blob/master/README.md

⭐️ A complete walkthrough for building a movie recommendation system in Python: https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/

How recommendation systems work: https://blog.statsbot.co/recommendation-system-algorithms-ba67f39ac9a3

karllhughes commented 5 years ago

We decided to use IMDB's data as this is a non-commercial project anyway. Here's how it'll work:

karllhughes commented 5 years ago

Fuzzy matching SQL Query:

SELECT id, title, year, levenshtein(lower(title), lower('halloween')) as similarity
FROM "moviesImporter_movie"
WHERE levenshtein(lower(title), lower('halloween')) < (length('halloween') / 4)
order by levenshtein(lower(title), lower('halloween')) asc, year desc;
karllhughes commented 5 years ago

The last step is making a recommendation algorithm. Looks like Surprise might be helpful: http://surpriselib.com/