Closed codingminions closed 6 months ago
The issue was not related to the fastapi pipeline but the way we were fetching articles from the newscatcher api in the backend code. That's fixed and we see no more duplicates in top_24_by_cat key. Closing this issue.
Steps to Reproduce:
Output:
top_24_by_topics: Object Sports Array (4) 57 57 57 57
Expected Output: As you can see, the indices are repeated. The list of article indices for each topic should not contain duplicate indices, ensuring that only unique articles are sent to the frontend interface.
The "/articles" endpoint might be making some assumptions about the kind of data fetched from the newscatcher API. Going through the raw articles data (attached file), I do see some fields which don't have consistent value present across all articles fetched. Needs to be fixed from the data pipeline end.
mongodb document entry text: rawMongoDocumentEntry.txt