Using the gist we wrote for https://github.com/selenodium/selenodium-grid/issues/30, while a browser now gets instantiated successfully and does what it's supposed to, I get an unusual error when trying to download the page's source via driver.page_source:
======================================================================
ERROR: test_search_in_python_org (__main__.PythonOrgSearch)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 20, in test_search_in_python_org
assert "No results found." not in driver.page_source
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 171, in execute
response = self.command_executor.execute(driver_command, params)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
return self._request(command_info[0], url, body=data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 425, in _request
data = resp.read()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 512, in read
s = self._safe_read(self.length)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 664, in _safe_read
raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(43872 bytes read, 17225 more expected)
----------------------------------------------------------------------
Ran 1 test in 5.941s
FAILED (errors=1)
It seems Python's http client is expecting more data from the page source than it's actually getting from the webdriver. There are some blog posts about similar issues (albeit caused by Python's urllib2 rather than client.py) which seem to suggest that the data does indeed get transmitted, but the session is closed at the inappropriate time which raises an exception and still leaves you with incomplete results:
http://bobrochel.blogspot.com/2010/11/bad-servers-chunked-encoding-and.html
This is somewhat confirmed by the Selenium node's log:
21:23:32.422 INFO - Executing: [new session: Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]])
21:23:32.428 INFO - Creating a new session for Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]
21:23:34.046 INFO - Done: [new session: Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]]
21:23:34.073 INFO - Executing: [get: http://www.google.com])
21:23:35.487 INFO - Done: [get: http://www.google.com]
21:23:35.501 INFO - Executing: [get page source])
21:23:35.764 INFO - Done: [get page source]
21:23:35.787 INFO - Executing: [delete session: 0e6b2904-e295-4a41-a6d6-5e015aa61818])
21:23:35.854 INFO - Done: [delete session: 0e6b2904-e295-4a41-a6d6-5e015aa61818]
The get page source step does indeed run successfully. Still though, this solution concerns websites with poorly implemented server routines, which probably isn't the case here. I did write a different Gist to look at this issue only:
https://gist.github.com/TheFifthFreedom/65a87b01f7a6b10624f8
Using the gist we wrote for https://github.com/selenodium/selenodium-grid/issues/30, while a browser now gets instantiated successfully and does what it's supposed to, I get an unusual error when trying to download the page's source via
driver.page_source
:It seems Python's http client is expecting more data from the page source than it's actually getting from the webdriver. There are some blog posts about similar issues (albeit caused by Python's
urllib2
rather thanclient.py
) which seem to suggest that the data does indeed get transmitted, but the session is closed at the inappropriate time which raises an exception and still leaves you with incomplete results: http://bobrochel.blogspot.com/2010/11/bad-servers-chunked-encoding-and.html This is somewhat confirmed by the Selenium node's log:The
get page source
step does indeed run successfully. Still though, this solution concerns websites with poorly implemented server routines, which probably isn't the case here. I did write a different Gist to look at this issue only: https://gist.github.com/TheFifthFreedom/65a87b01f7a6b10624f8