marswh12312313 / GeneSumCrawler

Python-based web scraper for extracting gene summaries from GeneCards.
MIT License
5 stars 1 forks source link

Failed to load summaries for GCDH #1

Open reni8911 opened 2 weeks ago

reni8911 commented 2 weeks ago

Message:

Failed to load summaries for GCDH: Message: Stacktrace: RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8 WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:180:5 NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:392:5 element.find/</<@chrome://remote/content/marionette/element.sys.mjs:133:16

marswh12312313 commented 2 weeks ago

Please give me more information.


However, here are some possible reasons and solutions for this issue:

  1. Website Structure Changes:

    • Cause: The Genecards website might have updated its page structure, and the element with id="summaries" may have been renamed or removed.
    • Solution: Manually inspect the GCDH gene page on Genecards to check if the summaries section exists or if its identifier has changed. Update the Selenium selectors in the code accordingly.
  2. Loading Delays:

    • Cause: The page might be taking longer than expected to load the summaries section, causing a timeout.
    • Solution: Increase the timeout duration in the WebDriverWait function. For example, change it from 10 seconds to 20 seconds:
      WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "summaries")))
  3. Pop-ups or Consent Forms:

    • Cause: A cookie consent pop-up or privacy notice might be blocking access to the main content.
    • Solution: Modify the code to detect and interact with any pop-ups before attempting to locate the summaries section. Here's an example:
      try:
       consent_button = WebDriverWait(driver, 10).until(
           EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))
       )
       consent_button.click()
      except Exception:
       print("Consent button not found or not clickable.")
  4. Anti-Bot Measures:

    • Cause: Genecards may have implemented anti-scraping measures that prevent automated access.
  5. Outdated WebDriver or Browser Version:

    • Cause: An outdated WebDriver or browser version might cause compatibility issues.
    • Solution: Update Selenium, GeckoDriver, and your browser to the latest versions to ensure compatibility.