ranahaani / GNews

A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
https://pypi.org/project/gnews/
MIT License
712 stars 106 forks source link

Add Support for Storing Feeds in MongoDB Using PyMongo #102

Open rexdivakar opened 2 months ago

rexdivakar commented 2 months ago

This PR introduces the functionality to store Google News feeds into a MongoDB database using PyMongo. The key enhancements and changes are detailed below:

  1. MongoDB Integration:

    • Introduced MongoDB client initialization using pymongo.MongoClient with connection strings and database names loaded from environment variables (DB_STRING and DB_NAME).
    • Added error handling for MongoDB connection errors.
  2. Upsert Capability:

    • Added a new parameter upsert to the GNews class, which when set to True, allows the news articles to be upserted into a specified MongoDB collection.
    • Implemented the upsert_news method which inserts new articles or skips duplicates based on the article title.
  3. Date Range Support:

    • Enhanced the _ceid method to support date range queries with start_date and end_date.
    • Added properties and setters for start_date and end_date with appropriate validation and warning messages.
  4. Query Methods:

    • Modified get_news, get_top_news, get_news_by_topic, get_news_by_location, and get_news_by_site methods to support upserting articles into MongoDB when the upsert flag is set and collection_name is provided.
  5. Utility and Helper Methods:

    • Added _clean method to clean HTML content from descriptions.
    • Modified _process method to filter out excluded websites and prepare article data for MongoDB insertion.
  6. Logging and Warnings:

    • Implemented logging for various events such as MongoDB connection errors, invalid topics or locations, and skipping of duplicate articles.
    • Added warnings for date range issues to guide users on how to properly set start_date and end_date.

How to Test:

  1. Environment Setup:

    • Ensure DB_STRING and DB_NAME environment variables are set for MongoDB connection.
    • Install required dependencies including pymongo, feedparser, beautifulsoup4, and dotenv.
  2. Initialization:

    • Instantiate the GNews class with upsert=True.
    gnews = GNews(upsert=True)
  3. Fetching and Upserting News:

    • Use methods like get_news, get_top_news, get_news_by_topic, get_news_by_location, or get_news_by_site with a collection_name parameter to fetch and upsert news articles.
    articles = gnews.get_news("OpenAI", collection_name="news_articles")
  4. Database Verification:

    • Verify the MongoDB collection specified by collection_name to ensure news articles are correctly inserted and duplicates are skipped.

Additional Notes:

By merging this PR, we enable the GNews class to store fetched news feeds into a MongoDB database, thus providing a persistent and scalable solution for managing news data.

rexdivakar commented 1 month ago

Hi, @ranahaani can you please look at my code and approve the changes