Open pocha opened 11 years ago
Hello,
I have had a quick look at this and couldn't reproduce this using the benchmark (are you able to reproduce by just using the multiple-clients.js benchmark?
However I have found a cause of 100% CPU usage by malicious intent.
It is possible to cause an infinate loop on the server by using the command:
$ while true
> do
> echo 'a'
> done
I will look into this further later to see if any application logic is a different cause, but it may be worth looking at your mongo logs to see if this or any variation of this command appears regularly.
Thanks,
Tom Gallacher
@tomgco thanks for checking. What node & npm version did you test it with ? Do you mind putting output of npm version
from inside the nodejs app.
As for recreating the issue, @dereXeus reported that if you run the benchmark test & once done, the CPU usage does not come back to 0%. It stayed at around 50% & I have a hunch it is the same issue which is getting manifested to 100% CPU usage. I am in process of reproducing it myself on my PC.
I just now checked that I still have v.0.10.9 running on my production server. If everything works fine on my PC for v0.10.12 (including the benchmark test), I would update the version on my production server too. Will upate my finding in the thread as well.
@pocha I tested it on node 0.10.17 on Mac OS X. I do have a Ubuntu machine which I will try to test it on later.
npm version
{ http_parser: '1.0',
node: '0.10.17',
v8: '3.14.5.9',
ares: '1.9.0-DEV',
uv: '0.10.14',
zlib: '1.2.3',
modules: '11',
openssl: '1.0.1e',
npm: '1.3.8',
'terminal-codelearn': '0.0.3' }
and this is the output from npm ls
Hi,
I had the same problem https://github.com/joyent/node/issues/5108#issuecomment-20861508 I was running node-proxy in production using forever with node v0.8.7, it ran for weeks altogether without any issue. When I upgraded to node v0.9.x the CPU spiked at 100% couple of times a day and the proxy would become unresponsive. The only solution being a restart. I have reverted back to node v0.8.7 and it seems to be working fine again.
@arunkjn I tried running my app for node v0.8.x but unfortunately the tests failed. Probably its some issue with some of the dependencies.
As for now - I have the app running on the server with node v0.10.17 .
As for replicating the issue - I could not do it on my own PC through single client or multiple client benchmarking script. I directed the single client script to connect to the server & the server did reach 100% CPU. I need to forever restartall
to bring back the CPU to few percents.
This is pretty strange now. My server is Ubuntu 12.04 AMD 64 bit.
Some more updates.
Some SO thread suggested to run using node-tick . The output of node-tick-processor when CPU was 100% is at http://pastebin.com/uAM3ncpd .
https://groups.google.com/forum/#!topic/nodejs/_U0MmS6rUl4 says that a lot of ticks for libc is probably system is spending a lot of time in epoll_wait() .
Can you quickly suggest something on top of head to provide a timeout or something so that system waits in epoll_wait() for sometime & simply moves out if timeout is hit.
Unfortunately, I am finding it pretty hard to recreate the situation on my PC. But it does get created on the server.
There is some update on this. Wanted to check with you if you have any insight.
When the CPU was 100% on the server, I used pstree -p
ubuntu@Ubuntu-1204-precise-64-minimal ~ $ pstree -p 32762
node(32762)─┬─su(582)───bash(583)
├─su(704)───bash(705)───rails-codelearn(747)───ruby(758)─┬─{ruby}(760)
│ └─{ruby}(762)
├─{node}(32763)
├─{node}(32764)
├─{node}(32765)
├─{node}(32766)
└─{node}(32767)
When I killed su(582)
with kill -9 582
, the CPU usage came back to normal.
Leaving the server on using forever make it take 100% CPU. There is no conclusive evidence when it happens. On a daily load of 200 users who are spending around 7-10 min of time, this happens couple of times every day. As for now, I have a cron script that checks if the process is having a high CPU usage & restarts the server using forever. It fixes the problem temporarily.
@dereXeus did some research & figured out that the Node server is not able to close file descriptors when he ran the server using strace. The situation is partially reproduce able if you run the benchmark tests as instructed in README. Once all the connections are done, the CPU does not go back to 0% . So it looks like this issue is similar to what I am getting on production server.
While Googling, I came across https://github.com/joyent/node/issues/5504 & it seems upgrading the nodejs version might fix the issue. I upgraded nodejs on production to v0.10.12 (it is an ubuntu server) but the problem remains.
I also tried upgrading the node version to the latest version but 'npm install' failed for the version.