joyzoursky / docker-python-chromedriver

Dockerfile for running Python Selenium in headless Chrome (Python 2.7 / 3.6 / 3.7 / 3.8 / Alpine based Python / Chromedriver / Selenium / Xvfb included in different versions)
https://hub.docker.com/r/joyzoursky/python-chromedriver/
MIT License
636 stars 196 forks source link

Is there any way to run python script without `--headless` argument? #12

Closed sazima closed 5 years ago

sazima commented 5 years ago

It seems that options.add_argument('--headless') is required.

options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-automation'])
options.add_argument('--no-sandbox')
# options.add_argument('--headless')
options.add_argument('--disable-gpu')
browser = webdriver.Chrome(options=options)
browser.get(url)
celery_1  | 
celery_1  | [2019-06-07 14:55:42,992: INFO/MainProcess] Received task: spider.get_config_from_url[f3479a81-e4a5-4c5b-b296-830d7f9d0864]  
celery_1  | [2019-06-07 14:55:45,174: ERROR/ForkPoolWorker-1] Task spider.get_config_from_url[f3479a81-e4a5-4c5b-b296-830d7f9d0864] raised unexpected: WebDriverException("unknown error: Chrome failed to start: ex
ited abnormally\n  (unknown error: DevToolsActivePort file doesn't exist)\n  (The process started from chrome location /usr/lib/chromium/chrome is no longer running, so ChromeDriver is assuming that Chrome has cr
ashed.)\n  (Driver info: chromedriver=2.38 (f91d32489882be7df38da3422a19713bfd113fa5),platform=Linux 4.14.91-bbrplus x86_64)", None, None)
celery_1  | Traceback (most recent call last):
celery_1  |   File "/usr/local/lib/python3.7/site-packages/celery/app/trace.py", line 385, in trace_task
celery_1  |     R = retval = fun(*args, **kwargs)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/celery/app/trace.py", line 648, in __protected_call__
celery_1  |     return self.run(*args, **kwargs)
celery_1  |   File "/api/spider.py", line 32, in get_config_from_url
celery_1  |     browser = webdriver.Chrome(options=options)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
celery_1  |     desired_capabilities=desired_capabilities)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
celery_1  |     self.start_session(capabilities, browser_profile)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
celery_1  |     response = self.execute(Command.NEW_SESSION, parameters)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
celery_1  |     self.error_handler.check_response(response)
celery_1  |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
celery_1  |     raise exception_class(message, screen, stacktrace)
celery_1  | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
celery_1  |   (unknown error: DevToolsActivePort file doesn't exist)
celery_1  |   (The process started from chrome location /usr/lib/chromium/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
celery_1  |   (Driver info: chromedriver=2.38 (f91d32489882be7df38da3422a19713bfd113fa5),platform=Linux 4.14.91-bbrplus x86_64)
celery_1  | 
celery_1  | [2019-06-07 14:56:52,964: INFO/MainProcess] Received task: spider.get_config_from_url[02c3befd-ed7e-41e3-8c0f-fbb48fdd0bf4]  
celery_1  | [2019-06-07 14:56:55,159: ERROR/ForkPoolWorker-1] Task spider.get_config_from_url[02c3befd-ed7e-41e3-8c0f-fbb48fdd0bf4] raised unexpected: WebDriverException("unknown error: Chrome failed to start: ex
ited abnormally\n  (unknown error: DevToolsActivePort file doesn't exist)\n  (The process started from chrome location /usr/lib/chromium/chrome is no longer running, so ChromeDriver is assuming that Chrome has cr
ashed.)\n  (Driver info: chromedriver=2.38 (f91d32489882be7df38da3422a19713bfd113fa5),platform=Linux 4.14.91-bbrplus x86_64)", None, None)
celery_1  | Traceback (most recent call last):

This is unusual, but I met it. When developing on my desktop system, I found a rule that when I used the headless parameter, it would be detected, so I could not get the real response.

sazima commented 5 years ago

Hiding Brower window will modify the request header or some javascript value, I guess, it is unavoidable.