Closed mlaugharn closed 3 years ago
Yep, exported CSVs of the movie/user collections live in data_processing/data
and could be used to start to populate a local Mongo database. The reviews collection is enormous, so I couldn't include the export in this remote repo, but there are definitely a few better ways to do this.
Let me get back to you on this after the next data update (I've been updating the data for the live site's model monthly and I'll add something to the README, too.
awesome thank u :)
All set! You can find data up to the latest crawl here: https://www.kaggle.com/samlearner/letterboxd-movie-ratings-data
I've added some instructions on the README, as well, for using this data/running the rest of the code on your own, though obviously, you're free to do whatever you want with the data.
Would it be possible to redistribute snapshots of the dataset so that there wouldn't need to be duplicated scraping? e.g. via automated torrents or something