http.client.IncompleteRead

TheFifthFreedom commented 9 years ago

Using the gist we wrote for https://github.com/selenodium/selenodium-grid/issues/30, while a browser now gets instantiated successfully and does what it's supposed to, I get an unusual error when trying to download the page's source via driver.page_source:

======================================================================
ERROR: test_search_in_python_org (__main__.PythonOrgSearch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 20, in test_search_in_python_org
    assert "No results found." not in driver.page_source
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 425, in _request
    data = resp.read()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 512, in read
    s = self._safe_read(self.length)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 664, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(43872 bytes read, 17225 more expected)

----------------------------------------------------------------------
Ran 1 test in 5.941s

FAILED (errors=1)

It seems Python's http client is expecting more data from the page source than it's actually getting from the webdriver. There are some blog posts about similar issues (albeit caused by Python's urllib2 rather than client.py) which seem to suggest that the data does indeed get transmitted, but the session is closed at the inappropriate time which raises an exception and still leaves you with incomplete results: http://bobrochel.blogspot.com/2010/11/bad-servers-chunked-encoding-and.html This is somewhat confirmed by the Selenium node's log:

21:23:32.422 INFO - Executing: [new session: Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]])
21:23:32.428 INFO - Creating a new session for Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]
21:23:34.046 INFO - Done: [new session: Capabilities [{browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]]
21:23:34.073 INFO - Executing: [get: http://www.google.com])
21:23:35.487 INFO - Done: [get: http://www.google.com]
21:23:35.501 INFO - Executing: [get page source])
21:23:35.764 INFO - Done: [get page source]
21:23:35.787 INFO - Executing: [delete session: 0e6b2904-e295-4a41-a6d6-5e015aa61818])
21:23:35.854 INFO - Done: [delete session: 0e6b2904-e295-4a41-a6d6-5e015aa61818]

The get page source step does indeed run successfully. Still though, this solution concerns websites with poorly implemented server routines, which probably isn't the case here. I did write a different Gist to look at this issue only: https://gist.github.com/TheFifthFreedom/65a87b01f7a6b10624f8

arikon commented 9 years ago

@TheFifthFreedom Thanks, will look into it

arikon commented 9 years ago

Fixed in https://github.com/selenodium/selenodium-grid/pull/40

arikon commented 9 years ago

@TheFifthFreedom Let me know if it worked for you

TheFifthFreedom commented 9 years ago

It does! Thanks very much!

selenodium / selenodium-grid

http.client.IncompleteRead #38