shanyachaubey / Whats_Happenin-A_big_data_project

0 stars 1 forks source link

[Mongo] Fastapi Data pipeline reduces the number of articles present for key: "articles" #40

Closed codingminions closed 6 months ago

codingminions commented 6 months ago

Steps to Reproduce:

  1. Start your local mongo server with db: 'userquery' and collection: 'sessions'.
  2. Fetch 500 articles for a particular location and date parameters from the newscatcher api. (have attached a file with articles data for Los Angeles, California between 03/02/2024 and 03/20/2024.)
  3. Create an entry in mongodb against the mentioned db&collection.
  4. Execute the fastapi command to generate a PUT request against "/articles" endpoint with the id of the entry made above.
  5. Once the api returns a 200 OK response, open the mongodb compass application and utilize the GUI buttons to see the documents stored in the 'userquery->sessions'.
  6. Check the number of articles present in the entry.

Output: The number of articles present in the entry after processing is 399 when the original number of articles fetched is 500.

articlesNumber

Expected Output: The number of articles should remain 500.

File rawMongoDocumentEntry.txt attached with sample mongo entry before processing starts. Please feel free to close the bug if this is the expected behavior.

shanyachaubey commented 6 months ago

Hey Prateek, this happens because we are removing duplicate articles out of the 500 that are getting fetched. Hence the size of the list decreases. I will experiment with the right number of articles to fetch so we can have 24 in each category. No promises yet.