kaliiiiiiiiii / Selenium-Driverless

undetected Selenium without usage of chromedriver
https://kaliiiiiiiiii.github.io/Selenium-Driverless/
Other
412 stars 52 forks source link

Error with driver.page_source on specific url #175

Closed vannman closed 2 months ago

vannman commented 4 months ago

This code:

from selenium_driverless.sync import webdriver

options = webdriver.ChromeOptions()

with webdriver.Chrome(options=options) as driver:
    driver.get('https://www.nytimes.com/section/world', wait_load=True, timeout=60)
    html = driver.page_source

results in the following error:

/Users/ian/PycharmProjects/INA/venv/bin/python /Users/ian/PycharmProjects/INA/scrapetest4.py 
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 510, in wait_for
    return await fut
           ^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 77, in exec
    res = await asyncio.wait_for(self._responses[_id], timeout=timeout)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 509, in wait_for
    async with timeouts.timeout(timeout):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/timeouts.py", line 111, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/scrapetest4.py", line 7, in <module>
    html = driver.page_source
           ^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/sync/webdriver.py", line 39, in __getattribute__
    return self._loop.run_until_complete(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/webdriver.py", line 774, in page_source
    return await target.page_source
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py", line 531, in page_source
    return await elem.source
           ^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/webelement.py", line 326, in source
    res = await self.__target__.execute_cdp_cmd("DOM.getOuterHTML", args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py", line 923, in execute_cdp_cmd
    result = await self.socket.exec(method=cmd, params=cmd_args, timeout=timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 93, in exec
    raise SocketExcitedError("socket coroutine excited without exception")
cdp_socket.exceptions.SocketExcitedError: socket coroutine excited without exception

Process finished with exit code 1

It works with other sections on the same site.

kaliiiiiiiiii commented 4 months ago

probably due to a url//page load. I'll see if I can reproduce it & try to make it racing-condition safe

vannman commented 4 months ago

Not sure if this is useful, but here is another log from an error I am getting:

2024-02-27 17:35:54,275 - ERROR - default_exception_handler - Task exception was never retrieved
future: <Task finished name='Task-699' coro=<Target.execute_cdp_cmd() done, defined at /Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py:902> exception=ConnectionClosedError(None, Close(code=<CloseCode.MESSAGE_TOO_BIG: 1009>, reason=''), None)>
Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
    message = await self.read_message()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1108, in read_data_frame
    frame = await self.read_frame(max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1165, in read_frame
    frame = await Frame.read(
            ^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/framing.py", line 107, in read
    new_frame = extension.decode(new_frame, max_size=max_size)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/extensions/permessage_deflate.py", line 133, in decode
    raise exceptions.PayloadTooBig(f"over size limit (? > {max_size} bytes)")
websockets.exceptions.PayloadTooBig: over size limit (? > 1048576 bytes)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py", line 923, in execute_cdp_cmd
    result = await self.socket.exec(method=cmd, params=cmd_args, timeout=timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 73, in exec
    _id = await self.send(method=method, params=params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 67, in send
    await self._ws.send(json.dumps(_dict))
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 635, in send
    await self.ensure_open()
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1009 (message too big); no close frame received
2024-02-27 17:35:54,278 - ERROR - default_exception_handler - Task exception was never retrieved
future: <Task finished name='Task-704' coro=<Target.execute_cdp_cmd() done, defined at /Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py:902> exception=ConnectionClosedError(None, Close(code=<CloseCode.MESSAGE_TOO_BIG: 1009>, reason=''), None)>
Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
    message = await self.read_message()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1108, in read_data_frame
    frame = await self.read_frame(max_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1165, in read_frame
    frame = await Frame.read(
            ^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/framing.py", line 107, in read
    new_frame = extension.decode(new_frame, max_size=max_size)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/extensions/permessage_deflate.py", line 133, in decode
    raise exceptions.PayloadTooBig(f"over size limit (? > {max_size} bytes)")
websockets.exceptions.PayloadTooBig: over size limit (? > 1048576 bytes)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/selenium_driverless/types/target.py", line 923, in execute_cdp_cmd
    result = await self.socket.exec(method=cmd, params=cmd_args, timeout=timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 73, in exec
    _id = await self.send(method=method, params=params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/cdp_socket/socket.py", line 67, in send
    await self._ws.send(json.dumps(_dict))
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 635, in send
    await self.ensure_open()
  File "/Users/ian/PycharmProjects/INA/venv/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1009 (message too big); no close frame received
vannman commented 4 months ago

I suspect the second error is caused by driver.page_source returning payload greater than 1 MB in size which websockets does not allow.

vannman commented 4 months ago

Changing in webdriver.py:

self._max_ws_size = max_ws_size

to:

self._max_ws_size = 99999999999999

solved the problem. No more errors.

kaliiiiiiiiii commented 4 months ago

Changing in webdriver.py:

self._max_ws_size = max_ws_size

to:

self._max_ws_size = 99999999999999

solved the problem. No more errors.

you can just pass that as an argument, see documentation

vannman commented 4 months ago

To be honest I'm finding it a bit difficult to understand the documentation. Do I set it when initiating the driver somehow?Could you give an example?

kaliiiiiiiiii commented 4 months ago

To be honest I'm finding it a bit difficult to understand the documentation. Do I set it when initiating the driver somehow?Could you give an example?

https://kaliiiiiiiiii.github.io/Selenium-Driverless/classes/Chrome/#webdriver-chrome

by default:

webdriver.Chrome(options=None, timeout=30, debug=False, max_ws_size=2**27*)