jpillora / webproc

Wrap any program in a simple web-based user-interface
731 stars 99 forks source link

BUG:Abnormal high CPU usage #23

Open xczh opened 3 years ago

xczh commented 3 years ago

When I used it with dnsmasq for a while, I observed abnormal high CPU usage, as shown in the screenshot.

image

I use strace to try to track PID 13544, which shows that the program is blocking the waitid system call, and there is no other information. Through the top command, I learned that CPU time is basically spent in us state, not sys state.

This situation has been going on for 24 hours, and it doesn't seem to be a problem of 'dnsmasq'. What happened to 'webproc'?

xczh commented 3 years ago

@jpillora I need help, thanks :)

jpillora commented 3 years ago

Could be related to log processing. Is dnsmasq logging heavily?

On Thu, 18 Feb 2021 at 7:54 pm Chris Winnie notifications@github.com wrote:

When I used it with dnsmasq for a while, I observed abnormal high CPU usage, as shown in the screenshot.

[image: image] https://user-images.githubusercontent.com/12578552/108330891-1f47bf80-7209-11eb-85d9-ed938653c8c0.png

I use strace to try to track PID 13544, which shows that the program is blocking the waitid system call, and there is no other information. Through the top command, I learned that CPU time is basically spent in us state, not sys state.

This situation has been going on for 24 hours, and it doesn't seem to be a problem of 'dnsmasq'. What happened to 'webproc'?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X44X6IRCTR7ZRYS27LLS7TIVXANCNFSM4XZ4QW7A .

xczh commented 3 years ago

No, the log frequency is about 1 line/s.

And this phenomenon is only recently. Before, I used v0.3.0 version to run stably for nearly a year, and did not find this problem. I upgraded to version v0.4.0 the day before yesterday, and this problem occurred two days after running.

xczh commented 3 years ago

In order to eliminate the problem of dnsmasq, I rerun the program. I used mpstat 1 to print out the CPU utilization per second, and found a very interesting phenomenon.

# webproc -c /etc/dnsmasq.conf -p 80 --user test --pass 123456 -- mpstat 1

image

After the program runs for 2 minutes, the CPU utilization rate is still as high as 100%.

These are mpstat output:

⁣09:22:39     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:22:40     all    0.00    0.00    0.00    0.00    0.25    0.25    0.00    0.00   99.50
⁣
⁣09:22:40     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:22:41     all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
⁣
⁣09:22:41     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:22:42     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

......

⁣09:24:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:25     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
⁣
⁣09:24:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:26     all    0.00    0.00    0.00    0.00    0.00    0.25    0.00    0.00   99.75
⁣
⁣09:24:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:27     all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
⁣
⁣09:24:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:28     all    4.52    0.00    0.50    0.00    0.25    0.00    0.00    0.00   94.72
⁣
⁣09:24:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:29     all   16.08    0.00    1.51    0.00    0.50    0.50    0.00    0.00   81.41
⁣
⁣09:24:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:30     all    0.25    0.00    0.00    0.00    0.25    0.00    0.00    0.00   99.50
⁣
⁣09:24:30     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:31     all   18.30    0.00    3.51    0.00    0.50    0.00    0.00    0.00   77.69
⁣
⁣09:24:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:32     all   17.97    0.00    0.51    0.00    0.51    0.00    0.00    0.00   81.01
⁣
⁣09:24:32     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:33     all   17.88    0.00    0.76    0.00    0.76    0.25    0.00    0.00   80.35
⁣
⁣09:24:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:34     all   18.18    0.00    0.76    0.00    0.51    0.00    0.00    0.00   80.56
⁣
⁣09:24:34     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:35     all   19.10    0.00    0.50    0.00    0.75    0.00    0.00    0.00   79.65
⁣
⁣09:24:35     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:36     all   19.24    0.00    0.76    0.00    0.51    0.00    0.00    0.00   79.49
⁣
⁣09:24:36     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
⁣09:24:37     all   18.94    0.00    0.76    0.00    0.76    0.00    0.00    0.00   79.55

Since then, CPU usage has remained at this level.

@jpillora

jpillora commented 3 years ago

Are you streaming logs on the web frontend? Does this happen when webproc has no clients?

On 18 Feb 2021 at 8:33:41 pm, Chris Winnie notifications@github.com wrote:

In order to eliminate the problem of dnsmasq, I rerun the program. I used mpstat 1 to print out the CPU utilization per second, and found a very interesting phenomenon.

webproc -c /etc/dnsmasq.conf -p 80 --user test --pass 123456 -- mpstat 1

[image: image] https://user-images.githubusercontent.com/12578552/108336018-903da600-720e-11eb-8b4a-9cf3559f67d7.png

After the program runs for 2 minutes, the CPU utilization rate is still as high as 100%.

These are mpstat output:

⁣09:22:39 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:22:40 all 0.00 0.00 0.00 0.00 0.25 0.25 0.00 0.00 99.50

⁣09:22:40 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:22:41 all 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 99.75

⁣09:22:41 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:22:42 all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

......

⁣09:24:24 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:25 all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

⁣09:24:25 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:26 all 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.00 99.75

⁣09:24:26 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:27 all 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 99.75

⁣09:24:27 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:28 all 4.52 0.00 0.50 0.00 0.25 0.00 0.00 0.00 94.72

⁣09:24:28 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:29 all 16.08 0.00 1.51 0.00 0.50 0.50 0.00 0.00 81.41

⁣09:24:29 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:30 all 0.25 0.00 0.00 0.00 0.25 0.00 0.00 0.00 99.50

⁣09:24:30 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:31 all 18.30 0.00 3.51 0.00 0.50 0.00 0.00 0.00 77.69

⁣09:24:31 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:32 all 17.97 0.00 0.51 0.00 0.51 0.00 0.00 0.00 81.01

⁣09:24:32 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:33 all 17.88 0.00 0.76 0.00 0.76 0.25 0.00 0.00 80.35

⁣09:24:33 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:34 all 18.18 0.00 0.76 0.00 0.51 0.00 0.00 0.00 80.56

⁣09:24:34 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:35 all 19.10 0.00 0.50 0.00 0.75 0.00 0.00 0.00 79.65

⁣09:24:35 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:36 all 19.24 0.00 0.76 0.00 0.51 0.00 0.00 0.00 79.49

⁣09:24:36 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

⁣09:24:37 all 18.94 0.00 0.76 0.00 0.76 0.00 0.00 0.00 79.55

Since then, CPU usage has remained at this level.

@jpillora https://github.com/jpillora

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781211749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X44KYUNS3N7PPXE4MQDS7TNHLANCNFSM4XZ4QW7A .

xczh commented 3 years ago

I just executed this command:

webproc -c /etc/dnsmasq.conf -p 80 --user test --pass 123456 -- mpstat 1

and then open the browser to access: http://localhost.

mpstat 1 prints CPU usage every second.

jpillora commented 3 years ago

If you don't open the browser, does CPU still go high?

On 18 Feb 2021 at 8:50:00 pm, Chris Winnie notifications@github.com wrote:

I just executed this command:

webproc - C / etc/ dnsmasq.conf -P 80 -- user test -- pass 123456 -- mpstat 1

and then open the browser to access: http://localhost.

mpstat 1 prints CPU usage every second.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781221513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X4ZYNWEMBIBZBIJB43LS7TPERANCNFSM4XZ4QW7A .

xczh commented 3 years ago

Exactly. I find that if I don't open the browser, the CPU usage is always less than 0.5%. Once you open your browser, it's 100% in a flash. But why is that?

Will JavaScript scripts affect the CPU utilization of the server?

jpillora commented 3 years ago

This means it’s a bug with the data synchronisation library (velox). The server is looping somewhere... most likely will need a new release which includes debugging API endpoints to investigate further

What os/arch are you running?

On Thu, 18 Feb 2021 at 9:02 pm Chris Winnie notifications@github.com wrote:

Exactly. I find that if I don't open the browser, the CPU usage is always less than 0.5%. Once you open your browser, it's 100% in a flash. But why is that?

Will JavaScript scripts affect the CPU utilization of the server?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781229006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X47TW6GC3WQTWRGON5LS7TQS7ANCNFSM4XZ4QW7A .

xczh commented 3 years ago

Fedora 33, amd64

jpillora commented 3 years ago

As a work around, could you try setting max lines to something low, maybe 100?

On Thu, 18 Feb 2021 at 9:23 pm Chris Winnie notifications@github.com wrote:

Fedora 33, amd64

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781241401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X4ZHS5I2X7DP5RYKLZTS7TTC7ANCNFSM4XZ4QW7A .

xczh commented 3 years ago

In addition, I suggest adding the net/http/pprof HTTP endpoint to facilitate troubleshooting.

xczh commented 3 years ago

It seems that this problem does not exist in v0.3.0. Can I switch back to the old version? What are the disadvantages?

jpillora commented 3 years ago

Shouldn’t be too different, been a long time, I’d need to look at the diff

On Thu, 18 Feb 2021 at 9:26 pm Chris Winnie notifications@github.com wrote:

It seems that this problem does not exist in v0.3.0. Can I switch back to the old version? What are the disadvantages?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781243182, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X44EXAONRTRZ425P4LLS7TTOVANCNFSM4XZ4QW7A .

xczh commented 3 years ago

image

I was wrong. Just now, I spent more than an hour retesting the v0.3.0 version and found that the leak still exists. CPU usage is increasing at a very slow rate, but it does continue to increase. This can lead to problems that are hard to detect and I've ignored in the past year.

In the version of v0.4.0, the CPU consumption increases rapidly when using the browser to access the page, which leads to the problem being found quickly.

This problem looks SERIOUS and needs to be fixed. Before the problem is solved, I think I can only disable webproc.

Or are there any effective measures?

jpillora commented 3 years ago

As a work around, could you try setting max lines to something low, maybe 100?

On 18 Feb 2021 at 10:53:01 pm, Chris Winnie notifications@github.com wrote:

[image: image] https://user-images.githubusercontent.com/12578552/108353049-a35a7100-7222-11eb-9ffe-2057e212336c.png

I was wrong. Just now, I spent more than an hour retesting the v0.3.0 version and found that the leak still exists. CPU usage is increasing at a very slow rate, but it does continue to increase. This can lead to problems that are hard to detect and I've ignored in the past year.

In the version of v0.4.0, the CPU consumption increases rapidly when using the browser to access the page, which leads to the problem being found quickly.

This problem looks SERIOUS and needs to be fixed. Before the problem is solved, I think I can only disable webproc.

Or are there any effective measures?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-781290242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X4ZIXHC4ULXN7HXD6G3S7T5R3ANCNFSM4XZ4QW7A .

Schwobaland commented 3 years ago

I have the same problem. Setting the max-lines to 100 didn't help...

jpillora commented 3 years ago

OS arch?

On Tue, 9 Mar 2021 at 10:29 pm Schwobaland notifications@github.com wrote:

I have the same problem. Setting the max-lines to 100 didn't help...

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/jpillora/webproc/issues/23#issuecomment-793750685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X4YAGA7XX6BOR6YTCALTCYBCXANCNFSM4XZ4QW7A .

Schwobaland commented 3 years ago

Arm64. What helped, was disabling the logs for the webui - with the downside of missing logs in the webui. -l proxy

By the way - thanks for creating webproc!!

rinex20 commented 2 years ago

I had the same issue like above. @jpillora ubuntu x86_64.

rinex20 commented 2 years ago

I had the same issue like above. @jpillora ubuntu x86_64.

finally, downgrade to 0.3.3 to fix this high cpu usage.

ThorstenKunz commented 2 years ago

Same issue with 0.4.0. I added "-l proxy" and that solved it. Running on Linux raspberrypi 5.4.0-1066-raspi #76-Ubuntu SMP PREEMPT Mon Jun 27 11:02:52 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux