tcgoetz / GarminDB

Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.
GNU General Public License v2.0
1.18k stars 142 forks source link

Initial Scrape Issue #10

Closed jhavens12 closed 6 years ago

jhavens12 commented 6 years ago

When running

make GC_DATE=02/01/2018 GC_DAYS=25 GC_USER=USER GC_PASSWORD=PWscrape_monitoring

It begins to scrape but fails with error

No handlers could be found for logger "scrape_garmin.py" Traceback (most recent call last): File "scrape_garmin.py", line 330, in main(sys.argv[1:]) File "scrape_garmin.py", line 307, in main scrape.get_monitoring(date, days) File "scrape_garmin.py", line 147, in get_monitoring self.browse_daily_page(profile_name, day_date) File "scrape_garmin.py", line 135, in browse_daily_page page_container = self.wait_for_pagecontainer(self.browser, 10) File "scrape_garmin.py", line 103, in wait_for_pagecontainer return self.wait_for_xpath(driver, time_s, "//div[@id='pageContainer']") File "scrape_garmin.py", line 99, in wait_for_xpath return WebDriverWait(driver, time_s).until(EC.presence_of_element_located((By.XPATH, xpath))) File "/Library/Python/2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:

tcgoetz commented 6 years ago

Does this happen always or only sometimes? It's waiting for the page to load. Do you see the page load in the browser? It should be waiting 10s.

tcgoetz commented 6 years ago

Here's the possibilities I see, please comment:

  1. You have a slow internet connection and these pages always load slowly. The page load time is near the limit always and occasionally goes over.
  2. You have slow DNS and the first time you load one of these pages it is slow. The timeout needs to be longer for the first page.
  3. Pages load quickly for the most part except for random, infrequent slow page loads.
  4. Pages load quickly for a while and then slow down until you stop loading them for a while. (Garmin is throttling. The scraper scrapes too many pages, too fast. Need to scrape slower.)

Do any of these sound like what your seeing?

jhavens12 commented 6 years ago

Looks like its working now, is there a way to add a print statement that says something along those lines when it fails?

tcgoetz commented 6 years ago

Committed above change to lengthen timeouts and provide logging if it fails again. Let me know if it helps.

tcgoetz commented 6 years ago

Closing as old.