neomatrix369 / learning-path-index

A repo with data files, assets and code supporting and powering the Learning Path Index Project
MIT License
15 stars 16 forks source link

Error handling to scrape_journey and option to add url #78

Open asvcode opened 3 weeks ago

asvcode commented 3 weeks ago

This pull request has 2 updates:

1) When using scrape_journey out of the box I get an Index error for the details and link sections. This update includes error handling to prevent the error

Traceback (most recent call last):
  File "C:\Users\avird\anaconda3\envs\lpi\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\avird\anaconda3\envs\lpi\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\PillView\learning-path-index\app\course-scraper\src\scrapers\google_cloud_skill_boost\scrape_journey.py", line 46, in <module>
    ml_learning_path = extract_ml_learning_path()
  File "C:\PillView\learning-path-index\app\course-scraper\src\scrapers\google_cloud_skill_boost\scrape_journey.py", line 30, in extract_ml_learning_path
    "details": journey.xpath(pages.GCSBLearningJourneyPage.journey_details)[
IndexError: list index out of range

(lpi) C:\PillView\learning-path-index\app\course-scraper\src>python -m scrapers.google_cloud_skill_boost.scrape_journey
Traceback (most recent call last):
  File "C:\Users\avird\anaconda3\envs\lpi\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\avird\anaconda3\envs\lpi\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\PillView\learning-path-index\app\course-scraper\src\scrapers\google_cloud_skill_boost\scrape_journey.py", line 46, in <module>
    ml_learning_path = extract_ml_learning_path()
  File "C:\PillView\learning-path-index\app\course-scraper\src\scrapers\google_cloud_skill_boost\scrape_journey.py", line 30, in extract_ml_learning_path
    "details": journey.xpath(pages.GCSBLearningJourneyPage.journey_details)[
IndexError: list index out of range

2) Create the option to enter the training URL when running this command ie when you run python -m scrapers.scrape_journey you get an entry to enter the URL Please enter the GCSB Journey URL:

Summary by Sourcery

Implement error handling in the scrape_journey function to prevent IndexError and add functionality for user input of the GCSB Journey URL.

Bug Fixes:

Enhancements:

sourcery-ai[bot] commented 3 weeks ago

Reviewer's Guide by Sourcery

This pull request implements error handling for the scrape_journey function and adds an option for users to input the URL when running the script. The changes focus on improving the robustness of the data extraction process and enhancing user interaction.

File-Level Changes

Change Details Files
Implemented error handling for data extraction
  • Added try-except blocks to handle potential IndexErrors when extracting journey details and links
  • Provided default values for cases where data is missing or cannot be extracted
  • Improved robustness of title and description extraction with fallback values
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py
Added user input for GCSB Journey URL
  • Implemented a prompt for users to enter the GCSB Journey URL when running the script
  • Modified the main execution block to use the user-provided URL
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py
Improved CSV writing process
  • Added a check to ensure data is not empty before writing to CSV
  • Implemented error handling for the CSV writing process
  • Added feedback messages for successful writing and error cases
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py
Refactored function signature and execution flow
  • Modified extract_ml_learning_path function to accept GCSB_JOURNEY_URL as a parameter
  • Moved the main execution logic into an if name == "main": block
  • Removed global variable usage for ml_learning_path
app/course-scraper/src/scrapers/google_cloud_skill_boost/scrape_journey.py

Tips - Trigger a new Sourcery review by commenting `@sourcery-ai review` on the pull request. - Continue your discussion with Sourcery by replying directly to review comments. - You can change your review settings at any time by accessing your [dashboard](https://app.sourcery.ai): - Enable or disable the Sourcery-generated pull request summary or reviewer's guide; - Change the review language; - You can always [contact us](mailto:support@sourcery.ai) if you have any questions or feedback.