pocha / terminal-codelearn

Super fast multi user pseudo bash Terminal in Node.js & SockJS
http://pocha.github.io/terminal-codelearn
80 stars 26 forks source link

The server starts taking 100% #7

Open pocha opened 11 years ago

pocha commented 11 years ago

Leaving the server on using forever make it take 100% CPU. There is no conclusive evidence when it happens. On a daily load of 200 users who are spending around 7-10 min of time, this happens couple of times every day. As for now, I have a cron script that checks if the process is having a high CPU usage & restarts the server using forever. It fixes the problem temporarily.

@dereXeus did some research & figured out that the Node server is not able to close file descriptors when he ran the server using strace. The situation is partially reproduce able if you run the benchmark tests as instructed in README. Once all the connections are done, the CPU does not go back to 0% . So it looks like this issue is similar to what I am getting on production server.

While Googling, I came across https://github.com/joyent/node/issues/5504 & it seems upgrading the nodejs version might fix the issue. I upgraded nodejs on production to v0.10.12 (it is an ubuntu server) but the problem remains.

I also tried upgrading the node version to the latest version but 'npm install' failed for the version.

tomgco commented 11 years ago

Hello,

I have had a quick look at this and couldn't reproduce this using the benchmark (are you able to reproduce by just using the multiple-clients.js benchmark?

However I have found a cause of 100% CPU usage by malicious intent.

It is possible to cause an infinate loop on the server by using the command:

$ while true
> do
> echo 'a'
> done

I will look into this further later to see if any application logic is a different cause, but it may be worth looking at your mongo logs to see if this or any variation of this command appears regularly.

Thanks,

Tom Gallacher

pocha commented 11 years ago

@tomgco thanks for checking. What node & npm version did you test it with ? Do you mind putting output of npm version from inside the nodejs app.

As for recreating the issue, @dereXeus reported that if you run the benchmark test & once done, the CPU usage does not come back to 0%. It stayed at around 50% & I have a hunch it is the same issue which is getting manifested to 100% CPU usage. I am in process of reproducing it myself on my PC.

I just now checked that I still have v.0.10.9 running on my production server. If everything works fine on my PC for v0.10.12 (including the benchmark test), I would update the version on my production server too. Will upate my finding in the thread as well.

tomgco commented 11 years ago

@pocha I tested it on node 0.10.17 on Mac OS X. I do have a Ubuntu machine which I will try to test it on later.

npm version

{ http_parser: '1.0',
  node: '0.10.17',
  v8: '3.14.5.9',
  ares: '1.9.0-DEV',
  uv: '0.10.14',
  zlib: '1.2.3',
  modules: '11',
  openssl: '1.0.1e',
  npm: '1.3.8',
  'terminal-codelearn': '0.0.3' }

and this is the output from npm ls

https://gist.github.com/tomgco/6317063

arunkjn commented 11 years ago

Hi,

I had the same problem https://github.com/joyent/node/issues/5108#issuecomment-20861508 I was running node-proxy in production using forever with node v0.8.7, it ran for weeks altogether without any issue. When I upgraded to node v0.9.x the CPU spiked at 100% couple of times a day and the proxy would become unresponsive. The only solution being a restart. I have reverted back to node v0.8.7 and it seems to be working fine again.

pocha commented 11 years ago

@arunkjn I tried running my app for node v0.8.x but unfortunately the tests failed. Probably its some issue with some of the dependencies.

As for now - I have the app running on the server with node v0.10.17 .

As for replicating the issue - I could not do it on my own PC through single client or multiple client benchmarking script. I directed the single client script to connect to the server & the server did reach 100% CPU. I need to forever restartall to bring back the CPU to few percents.

This is pretty strange now. My server is Ubuntu 12.04 AMD 64 bit.

pocha commented 11 years ago

Some more updates.

Some SO thread suggested to run using node-tick . The output of node-tick-processor when CPU was 100% is at http://pastebin.com/uAM3ncpd .

https://groups.google.com/forum/#!topic/nodejs/_U0MmS6rUl4 says that a lot of ticks for libc is probably system is spending a lot of time in epoll_wait() .

Can you quickly suggest something on top of head to provide a timeout or something so that system waits in epoll_wait() for sometime & simply moves out if timeout is hit.

Unfortunately, I am finding it pretty hard to recreate the situation on my PC. But it does get created on the server.

pocha commented 11 years ago

There is some update on this. Wanted to check with you if you have any insight.

When the CPU was 100% on the server, I used pstree -p to found out the processes. It looked like .

ubuntu@Ubuntu-1204-precise-64-minimal ~ $ pstree -p  32762
node(32762)─┬─su(582)───bash(583)
            ├─su(704)───bash(705)───rails-codelearn(747)───ruby(758)─┬─{ruby}(760)
            │                                                        └─{ruby}(762)
            ├─{node}(32763)
            ├─{node}(32764)
            ├─{node}(32765)
            ├─{node}(32766)
            └─{node}(32767)

When I killed su(582) with kill -9 582, the CPU usage came back to normal.