neomatrix369 / learning-path-index

A repo with data files, assets and code supporting and powering the Learning Path Index Project
MIT License
15 stars 16 forks source link

Error handling to scrape_journey and option to add url #107

Open TobeTek opened 1 week ago

TobeTek commented 1 week ago

Summary by Sourcery

Enhance the Google Cloud Skill Boost scraper by adding error handling for missing elements and allowing users to input a custom URL. Improve the script's robustness by handling missing journey details and links gracefully, and update the CSV writing process to handle errors and empty data scenarios. Additionally, clean up formatting in the LPI GPT script.

New Features:

Bug Fixes:

Enhancements:

sourcery-ai[bot] commented 1 week ago

Reviewer's Guide by Sourcery

This pull request implements error handling in the scrape_journey.py file and adds an option to input a custom URL. It also includes minor formatting changes in the lpiGPT.py file.

Sequence diagram for the extract_ml_learning_path function with error handling

sequenceDiagram
    actor User
    participant Script
    participant GCSBServer as GCSB Server

    User->>Script: Input GCSB_JOURNEY_URL
    Script->>GCSBServer: GET GCSB_JOURNEY_URL
    GCSBServer-->>Script: HTML Content

    alt Journey Details Available
        Script->>Script: Extract journey details
    else Journey Details Missing
        Script->>Script: Use default "No details available"
    end

    alt Journey Link Available
        Script->>Script: Construct full URL
    else Journey Link Missing
        Script->>Script: Use default "No link available"
    end

    Script->>Script: Append data to list

    alt Data List Not Empty
        Script->>Script: Write data to CSV
        Script->>User: Notify success
    else Data List Empty
        Script->>User: Notify "No data to write!"
    end

    alt Error Occurred
        Script->>User: Notify error
    end

File-Level Changes

Change Details Files
Improved error handling and flexibility in the extract_ml_learning_path function
  • Added a parameter to accept a custom GCSB_JOURNEY_URL
  • Implemented try-except blocks to handle potential IndexErrors when extracting journey details and links
  • Added fallback values for missing data (e.g., 'No details available', 'No link available')
  • Created a main block to allow user input for the GCSB_JOURNEY_URL
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py
Enhanced data writing process with error handling
  • Added a check to ensure data is not empty before writing to CSV
  • Implemented a try-except block for writing data to the CSV file
  • Added informative print statements for successful writes and errors
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py
Minor formatting changes in lpiGPT.py
  • Removed trailing whitespaces
  • Adjusted line breaks for consistency
app/llm-poc-variant-01/lpiGPT.py

Tips and commands #### Interacting with Sourcery - **Trigger a new review:** Comment `@sourcery-ai review` on the pull request. - **Continue discussions:** Reply directly to Sourcery's review comments. - **Generate a GitHub issue from a review comment:** Ask Sourcery to create an issue from a review comment by replying to it. - **Generate a pull request title:** Write `@sourcery-ai` anywhere in the pull request title to generate a title at any time. - **Generate a pull request summary:** Write `@sourcery-ai summary` anywhere in the pull request body to generate a PR summary at any time. You can also use this command to specify where the summary should be inserted. #### Customizing Your Experience Access your [dashboard](https://app.sourcery.ai) to: - Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others. - Change the review language. - Add, remove or edit custom review instructions. - Adjust other review settings. #### Getting Help - [Contact our support team](mailto:support@sourcery.ai) for questions or feedback. - Visit our [documentation](https://docs.sourcery.ai) for detailed guides and information. - Keep in touch with the Sourcery team by following us on [X/Twitter](https://x.com/SourceryAI), [LinkedIn](https://www.linkedin.com/company/sourcery-ai/) or [GitHub](https://github.com/sourcery-ai).