Closed dmhacker closed 5 years ago
I'm gonna pull this and try and test if it works. Ideally we need some kind of testing for this stuff to verify that it works (even if you know it works it is good so others can also verify it).
sdschedule-backend | Error encountered by thread 1. Gracefully exiting ... sdschedule-backend | Traceback (most recent call last): sdschedule-backend | File "/app/scraper/scraper.py", line 118, in iter_departments_by_thread_handle_errors sdschedule-backend | self.iter_departments_by_thread(thread_id, num_threads) sdschedule-backend | File "/app/scraper/scraper.py", line 137, in iter_departments_by_thread sdschedule-backend | browser = webdriver.Chrome(chrome_options=options, executable_path=DRIVER_PATH) sdschedule-backend | File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in init sdschedule-backend | desired_capabilities=desired_capabilities) sdschedule-backend | File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init sdschedule-backend | self.start_session(capabilities, browser_profile) sdschedule-backend | File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session sdschedule-backend | response = self.execute(Command.NEW_SESSION, parameters) sdschedule-backend | File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute sdschedule-backend | self.error_handler.check_response(response) sdschedule-backend | File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response sdschedule-backend | raise exception_class(message, screen, stacktrace) sdschedule-backend | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally sdschedule-backend | (Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.15.0-29-generic x86_64) sdschedule-backend | sdschedule-backend | [T0] Saving ECE (#11) to /cache/course_pages/ECE/11.html ... sdschedule-backend | Thread 6 is exiting gracefully ... sdschedule-backend | Thread 3 is exiting gracefully ... sdschedule-backend | Thread 4 is exiting gracefully ... sdschedule-backend | Thread 5 is exiting gracefully ... sdschedule-backend | Thread 7 is exiting gracefully ... sdschedule-backend | Thread 0 is exiting gracefully ... sdschedule-backend | Thread 2 is exiting gracefully ... sdschedule-backend | The scraper has crashed. Please retry.
It handled the error correctly - I increased the latency on chrome to simulate bad connection, but it seemed that it crashed for reasons unrelated to latency - chrome driver failed to start.
Regardless, looks good. I'll accept - you can fix the chrome driver not starting with a wrapper method if you want but I don't think it is a huge concern.
See #33 for a description of the error(s).
There are two main fixes encapsulated in these commits: