noisebridge / MediaBridge

1 stars 1 forks source link

Store matched (and unmatched) movies in MongoDB #11

Open audiodude opened 1 week ago

audiodude commented 1 week ago

We have a Mongo DB set up here:

mongodb+srv://noisebridgeproject.audlswx.mongodb.net/

user: noisebridge password: same as sfpythonlab.com

Let's set up our wiki_to_netflix.py code to push the data into Mongo. We need to think about what the objects we want to put in, specifically their shape. Probably something like the CSV output, but with a key for each column name. So for this set of data:

processed_data.append([title, year, netflix_id, wiki_movie_ids_list[index], wiki_genres_list[index], wiki_directors_list[index]])

We should have:

{
  title: "foo title",
  year: ...
  netflix_id: ...
  wikidata_id: ...

   ...etc...
}
audiodude commented 1 week ago

Some sample/psuedo code.

def insert_into_mongo(processed_data):
    client = pymongo.connect('mongo+srv://...')

    for movie in processed_data:
        client.mediabridge.movies.upsert({
            'title': movie[0],
            'year': movie[1]
            ...
        })
cocomittens commented 4 days ago

@audiodude Are we storing unmatched movies also? The current output only includes found movies. I feel like it could theoretically be worth adding them as well, but I'm assuming they wouldn't actually be used to make recommendations? So unsure if we actually need them or if I'm missing something.

audiodude commented 4 days ago

Yes I think we should store unmatched movies with a wikidata_id of NULL. This will make it easier for the recommendation part to lookup movies by their netflix ID or title.