Closed xconverge closed 2 years ago
I'm not seeing any high CPU usage and my container has been running for over 6 months. I get the TimeoutError but I don't see it hurting my setup.
I am not sure how to best pin it down, the nightly restart of the container mitigates it plenty for me. I can move the container to another server in a bit and try that for a few days/weeks
I am running a fork with updated dependencies and a few more puppeteer flags (mostly --no-zygote) I will report back or open a PR if my problems go away.
My initial report is not fully accurate, the CPU seems to go back to normal after a successful update, and my interval is 4 hrs, so if whatever the issue is occurs, the cpu is pinned for that 4 hrs, and then there is another chance for it to resolve itself or continue to be pinned for another 4 hours.
I saw the same thing on 2 different hosts/environments/os. I am guessing you wouldnt notice this if your interval is pretty low and if you have a bunch of cpu overhead as well
New theory....based on the logs, I seem to periodically be getting stuck at this page.goto and the waitUntil never resolves I will look into adding a timeout mechanism or something maybe, but there are a lot of page loads in here that wait with no timeout, so maybe others get hungup periodically as well:
private async authenticate() {
logger.debug(`Loading (${LOGIN_URL})`);
const page = await this.getPage();
const response = await page.goto(LOGIN_URL, { waitUntil: ['networkidle0', 'load', 'domcontentloaded'] });
https://pptr.dev/api/puppeteer.page.goto
The default timeout
for goto
is 30 seconds.
https://pptr.dev/api/puppeteer.page.goto
The default
timeout
forgoto
is 30 seconds.
Thanks, I see, that doesnt really explain the log I was looking at then, I will run things for another day or so (I have debug logging enabled now) so that should be a bit more helpful
Ok its definitely not timing out in 30 seconds...from the same page you sent:
The default value can be changed by using the [Page.setDefaultNavigationTimeout()](https://pptr.dev/api/puppeteer.page.setdefaultnavigationtimeout) or [Page.setDefaultTimeout()](https://pptr.dev/api/puppeteer.page.setdefaulttimeout) methods.
and in getPage() we are calling:
await this.#page.setDefaultNavigationTimeout(this.#pageTimeout);
I will try defining pageTimeout in my config.yml to play with it a bit (I know it is trying to also set it to 30 by default but that doesnt seem to be working)
If it's not defined in the config is defaults here.
If it's not defined in the config is defaults here.
Yes I saw that and was what I meant by "(I know it is trying to also set it to 30 by default but that doesnt seem to be working)" but I definitely have a log message that was like this for over 5 minutes, I restarted the container at 16:49:38
2022-07-19 16:40:27 [info] : Xfinity Update every 240 mins
2022-07-19 16:40:27 [info] : Http server will be started
2022-07-19 16:40:27 [info] : Will publish to MQTT (Home Assistant) on new data
2022-07-19 16:40:27 [info] : http server started
2022-07-19 16:40:28 [verbose] : Fetching Data
2022-07-19 16:40:28 [debug] : Loading (https://customer.xfinity.com)
2022-07-19 16:40:28 [info] : Connected to MQTT
2022-07-19 16:49:38 [info] : Xfinity Update every 240 mins
2022-07-19 16:49:38 [info] : Http server will be started
2022-07-19 16:49:38 [info] : Will publish to MQTT (Home Assistant) on new data
2022-07-19 16:49:38 [info] : http server started
2022-07-19 16:49:38 [verbose] : Fetching Data
2022-07-19 16:49:38 [debug] : Loading (https://customer.xfinity.com)
2022-07-19 16:49:38 [info] : Connected to MQTT
2022-07-19 16:49:42 [debug] : Login redirects: undefined
2022-07-19 16:49:42 [debug] : Logging in
You could try dropping in a logger after each line and using the screenshot function to see if the page looks like it is supposed.
If you're running this in a docker you'll want to switch these comments.
What do you think about this:
needs a "this.#page === undefined ||" or some changed logic. The request seems to be getting stuck here
That's what the ?
is doing.
if(this.#page !== false || this.#page.isClosed() !== false) {
sorry for rubber ducking hard, I will get some more time on it and report back...I was missing quite a few log statements so with what I have so far it looks like potentially hanging up on browser.newPage but I will wait until I am sure before looking into it more
Ok so sometimes it just fails to get a new page
and then when the interval is up, it tries again but fails again, so it never works again until the container is restarted
If I restart the container 10 times, it does this maybe 1 out of the 10 times
I added dumpio: true to get some more debug from puppeteer but it gives me nothing helpful. I also added --disable-gpu because there were quite a few gpu errors happening
working:
2022-07-20 11:48:16 [info] : Xfinity Update every 240 mins
2022-07-20 11:48:16 [info] : Http server will be started
2022-07-20 11:48:16 [info] : Will publish to MQTT (Home Assistant) on new data
2022-07-20 11:48:16 [info] : http server started
2022-07-20 11:48:16 [verbose] : Fetching Data
2022-07-20 11:48:16 [debug] : Loading (https://customer.xfinity.com)
2022-07-20 11:48:16 [debug] : Page:
2022-07-20 11:48:16 [debug] : undefined
2022-07-20 11:48:16 [debug] : Getting browser...
2022-07-20 11:48:17 [info] : Connected to MQTT
[0720/114817.340246:ERROR:browser_main_loop.cc(540)] Failed to open an X11 connection.
[0720/114817.343741:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0720/114817.349331:WARNING:audio_manager_linux.cc(69)] Falling back to ALSA for audio output. PulseAudio is not available or could not be initialized.
DevTools listening on ws://127.0.0.1:40309/devtools/browser/ea76ae77-cb77-4f37-9d17-18445f98584e
2022-07-20 11:48:17 [debug] : Got browser!
2022-07-20 11:48:17 [debug] : Getting new page...
2022-07-20 11:48:17 [debug] : Got new page!
2022-07-20 11:48:17 [debug] : Setting cookies, user agent, and viewport...
2022-07-20 11:48:17 [debug] : Cookies, user agent, and viewport set!
2022-07-20 11:48:17 [debug] : Calling setDefaultNavigationTimeout with a value of:
2022-07-20 11:48:17 [debug] : 30000
2022-07-20 11:48:17 [debug] : Got Page
[0720/114817.984491:INFO:CONSOLE(1)] "A parser-blocking, cross site (i.e. different eTLD+1) script, https://static.cimcontent.net/data-layer/?appId=resi_myaccount, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.", source: https://customer.xfinity.com/ (1)
[0720/114817.992400:INFO:CONSOLE(1)] "A parser-blocking, cross site (i.e. different eTLD+1) script, https://static.cimcontent.net/data-layer/?appId=resi_myaccount, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.", source: https://customer.xfinity.com/ (1)
[0720/114818.825904:INFO:CONSOLE(0)] "Uncaught (in promise) NetworkError: A network error occurred.", source: https://customer.xfinity.com/ (0)
2022-07-20 11:48:21 [debug] : Login redirects: 0
2022-07-20 11:48:21 [debug] : Loading Usage https://customer.xfinity.com/apis/csp/account/me/services/internet/usage?filter=internet
2022-07-20 11:48:21 [debug] : Page:
2022-07-20 11:48:21 [debug] : [object Object]
2022-07-20 11:48:21 [debug] : Using existing page:
2022-07-20 11:48:21 [debug] : [object Object]
2022-07-20 11:48:24 [debug] : Saving cookies for next fetch
2022-07-20 11:48:24 [debug] : Page:
2022-07-20 11:48:24 [debug] : [object Object]
2022-07-20 11:48:24 [debug] : Using existing page:
2022-07-20 11:48:24 [debug] : [object Object]
2022-07-20 11:48:24 [verbose] : Data retrieved
2022-07-20 11:48:24 [info] : Usage updated: 482 GB. Next update at 3:48:16 PM
not working:
2022-07-20 11:49:46 [info] : Xfinity Update every 240 mins
2022-07-20 11:49:46 [info] : Http server will be started
2022-07-20 11:49:46 [info] : Will publish to MQTT (Home Assistant) on new data
2022-07-20 11:49:46 [info] : http server started
2022-07-20 11:49:46 [verbose] : Fetching Data
2022-07-20 11:49:46 [debug] : Loading (https://customer.xfinity.com)
2022-07-20 11:49:46 [debug] : Page:
2022-07-20 11:49:46 [debug] : undefined
2022-07-20 11:49:46 [debug] : Getting browser...
2022-07-20 11:49:46 [info] : Connected to MQTT
[0720/114947.268915:ERROR:browser_main_loop.cc(540)] Failed to open an X11 connection.
[0720/114947.276692:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0720/114947.287006:WARNING:audio_manager_linux.cc(69)] Falling back to ALSA for audio output. PulseAudio is not available or could not be initialized.
DevTools listening on ws://127.0.0.1:34947/devtools/browser/27007cf9-5efe-48b2-869e-b908ee0945e3
2022-07-20 11:49:47 [debug] : Got browser!
2022-07-20 11:49:47 [debug] : Getting new page...
also not working when I let it run for the interval to see if it fixed itself (prior to adding some of the debug flags)
022-07-19 18:40:59 [info] : Xfinity Update every 240 mins
2022-07-19 18:40:59 [info] : Http server will be started
2022-07-19 18:40:59 [info] : Will publish to MQTT (Home Assistant) on new data
2022-07-19 18:40:59 [info] : http server started
2022-07-19 18:40:59 [verbose] : Fetching Data
2022-07-19 18:40:59 [debug] : Loading (https://customer.xfinity.com)
2022-07-19 18:40:59 [debug] : Page:
2022-07-19 18:40:59 [debug] : undefined
2022-07-19 18:40:59 [debug] : Getting browser...
2022-07-19 18:40:59 [info] : Connected to MQTT
2022-07-19 18:41:00 [debug] : Got browser!
2022-07-19 18:41:00 [debug] : Getting new page...
2022-07-19 22:40:59 [verbose] : Fetching Data
2022-07-19 22:40:59 [debug] : Loading (https://customer.xfinity.com)
2022-07-19 22:40:59 [debug] : Page:
2022-07-19 22:40:59 [debug] : undefined
2022-07-19 22:40:59 [debug] : Getting browser...
2022-07-19 22:40:59 [debug] : Got browser!
2022-07-19 22:40:59 [debug] : Getting new page...
I changed a lot at once after some reading on newPage issues and will report back
I am fine managing/using my own fork forever if no one else has this problem
Looking promising so far, I will keep an eye on it over the next few days
I am considering this solved for me, let me know if I should open an MR or if it is too large of a change without solid engineering behind it, I definitely jumped around a bit...
I have no objections to making the changes if it works across the builds.
What type of hardware are you running this on? If I recall correctly I couldn't get the arm7 to build successfully with chromium when using alpine.
I'm interested in learning which steps actually fixed your issue.
I really only see 3 major changes that I think would change anything:
I will get some days of runtime on this, I am happy managing my own fork for myself (and also thankful for you making this available in the first place!)
I am currently running it on a pi4 x64 (armv8) but was running it on a synology NAS running dsm7 before (both systems exhibited the same issue periodically)
I suspect "pipe: true" as the launch parameter is the main fix, but (2) is a strong contender (it is chromium 102.x I believe).
(1) didnt really change anything other than potential compatibility with chromium versions.
The other startup flags (no-zygote, single process, and disable-gpu) all make sense and removed a few weird behaviors/errors I saw when I had dumpio: true but did not change the behavior.
Looking at puppeteer issues related to "newpage hangs", "newpage stalls" pointed me to a LOT of different related looking issues.
Just following up here, I have had 0 problems since I switched to alpine/my fork, but I dont think it is worth redoing all of the builds etc for different platforms if no one else has this issue, 0 additional thumbs ups since I opened it close to 6 months ago...I will continue to run what works for me, thanks a lot for the image!
It probably would be nice to move all the builds to alpine. If I find some time I might see if I can get it working.
Every few days, something happens that leaves chromium instances with very high CPU usage
I also realized that all of the chromium instances are leftover and still running too!?
I setup a cronjob to restart this container nightly due to this issue
I think there "might" be a relation to #54, I typically see this error in the log prior to restarting the container
Here is a small start to debugging when found in this state:
Is there any more info I can get to supplement this info for this issue next time it happens? Does anyone else experience this?