iv-org / youtube-trusted-session-generator

This is a generator for getting a session that passes all the checks from YouTube side
GNU Affero General Public License v3.0
60 stars 5 forks source link

Browser randomly fails to start in docker container #3

Open MMaster opened 1 month ago

MMaster commented 1 month ago

When running the docker container the browser fails to start quite often throwing the following exception:

[INFO] launching the python script
[INFO] launching browser.
Traceback (most recent call last):
  File "/usr/app/src/index.py", line 46, in <module>
    loop().run_until_complete(main())
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/app/src/index.py", line 10, in main
    browser = await start(headless=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/nodriver/core/util.py", line 74, in start
    return await Browser.create(config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/nodriver/core/browser.py", line 87, in create
    await instance.start()
  File "/usr/local/lib/python3.12/site-packages/nodriver/core/browser.py", line 343, in start
    raise Exception(
Exception:
                ---------------------
                Failed to connect to browser
                ---------------------
                One of the causes could be when you are running as root.
                In that case you need to pass no_sandbox=True

Exception ignored in atexit callback: <function deconstruct_browser at 0x7fbf3f225580>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/nodriver/core/util.py", line 124, in deconstruct_browser
    _.stop()
  File "/usr/local/lib/python3.12/site-packages/nodriver/core/browser.py", line 545, in stop
    asyncio.get_event_loop().create_task(self.connection.aclose())
                                         ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'aclose'

Catching the exception and trying again after 3 seconds until it succeeds seems to fix this issue.

unixfox commented 1 month ago

How much RAM does your system have?

Could you try to add --shm-size=2G in the docker run command?

MMaster commented 1 month ago

Currently the VM has 8GB RAM and half of it is free. I sure can increase the shared memory size for that, but I'm not really sure how that should help. I tried it and even the very first run failed.

The browser starts successfully 1 time and then fails to start 3 times on 4 different runs (it was doing the same thing with headless browser when X was not part of the docker image).

I've fixed the issue by simply doing try except on the start call and doing max 5 retries before giving up and it works, so memory doesn't seem to be the issue here.

unixfox commented 1 month ago

Ok. That's strange because I can't replicate the issue. I launched the script 5 times in a row and never had the issue:

Are you sure you are running the latest version of the script?

image

unixfox commented 1 month ago

Related: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/1949 and https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/1896

MMaster commented 1 month ago

Yeah I am on latest version.

The related issues that you marked are actually 2 separate issues. The first one happens even if the browser starts successfully and returns the tokens - I suspect it is because the async event loop is not properly stopped, but instead sys.exit is called. The second one may be related, but the solutions there don't really apply since I don't have any zombie processes, the docker container stops after the run with no leftovers and also the corrupted user data doesn't apply since it's immutable docker image.

Anyway I noticed even on stackoverflow that some people have this issue randomly with nodriver with no reliable solution.

unixfox commented 1 month ago

Ok I was able to reproduce the issue on a VM with just 2 cores and 1GB of RAM.

MMaster commented 1 month ago

Ok I was able to reproduce the issue on a VM with just 2 cores and 1GB of RAM.

fyi: Yesterday it stopped happening completely on the original VM. But it happened on dedicated machine with 128 GB RAM and 10 cores / 20 threads. It's completely random for me.

unixfox commented 1 month ago

Ok I have narrowed down the issue, nodriver doesn't wait enough time before giving up trying to connect chromium instance: https://github.com/ultrafunkamsterdam/nodriver/blob/main/nodriver/core/browser.py#L340-L346

I have pushed a dirty patch in the Dockerfile for waiting more time: https://github.com/iv-org/youtube-trusted-session-generator/commit/0551c92227287aaa32d3a559635dde8f7ea9b5a1#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R22. I validated that it works fine on my VM with 2vCPU and 1GB of RAM.

But I'm waiting for an official implementation, for which I have created a PR:

markus583 commented 3 days ago

Hi, even with the dirty patch and also when increasing the sleep time further, I still get the same error. VM resources are not the issue, it has much more than 2vCPU and 1GB of RAM.

Did anyone encounter this issues in other settings/has fixes for them? Thanks!