Plan the rest of work and issues

rimidalvk commented 1 year ago

Proposed issues for time and planning:

Issue #2: Develop Logger Module

Description: Develop logger.py to handle all logging needs of the application, including cycle start/end times, next cycle plan, IP address, computer name, etc. The logs should be structured as outlined in this Google Sheet: Link.

Issue #3: Develop Basic Scraper Module

Description: Develop a basic scraper module that will be used as a template for the social media scrapers. The scraper should be able to take a URL and return data based on the provided configuration. Make sure to implement measures that mimic human browsing behavior to prevent being blocked by the social media platforms.

Issue #4: Develop LinkedIn Scraper

Description: Using the basic scraper module as a template, develop the linkedin_scraper.py module to handle scraping from LinkedIn.

Issue #5: Develop Reddit Scraper

Description: Using the basic scraper module as a template, develop the reddit_scraper.py module to handle scraping from Reddit.

Issue #6: Develop Medium Scraper

Description: Using the basic scraper module as a template, develop the medium_scraper.py module to handle scraping from Medium.

Issue #7: Develop Google Sheets Integration

Description: Develop google_sheets.py module that will handle all interactions with Google Sheets, including data storage and retrieval.

Issue #8: Develop Configuration File Parser with Google Sheets Integration

Description: Update config.py to fetch and update configuration settings from Google Sheets at the beginning of each cycle.

Issue #9: Link Scrapers to Google Sheets Integration

Description: Connect the social media scrapers to the Google Sheets integration to ensure data flows correctly from the scraper to Google Sheets.

Issue #10: Implement Cycle Management in main.py

Description: Implement the cycle management described in the project workflow within main.py. This includes running scrapers based on the configuration file, planning the next cycle, and logging.

Issue #11: Implement Post Link Fetching and Filtering

Description: Implement the functionality to fetch all links to posts and comments from Google Sheets and filter them according to the rules defined in the configuration to determine which links will need to be scraped. This should be integrated with the cycle management in main.py.

Mazon7 commented 1 year ago

Project Update and Task Decomposition:

Connecting to Google Sheets. - Done
Reading Data (links and Publication time) from the main tab. - Done
Reading data from the Config Table and developing decision-making algorithm - In Progress Period: 1 day
Module for data filtration based on the main rules (from the config) Period: 1 day
Module for scrapper error logging (reading, writing, managing data) Period: 2 days
Module of writing the scrapper run results (start time, end time, errors and etc.) Period: 0.5 days
Module of writing scrapper's result data to the Google Sheets. Period: 0.5 days
Debugging/Refactoring the operation of all modules between each other Period: 1 day

Updates will be provided upon tasks realization.

Mazon7 commented 12 months ago

Update:

Connecting to Google Sheets. - Done Reading Data (links and Publication time) from the main tab. - Done Reading data from the Config Table and developing decision-making algorithm - Done Module for data filtration based on the main rules (from the config) - Done (testing)

Module of writing scrapper's result data to the Google Sheets. - In Progress

rimidalvk / SMM_Stats_to_GS_parser

Plan the rest of work and issues #2