keymetrics / pm2-server-monit

Monitor server CPU / Memory / Process / Zombie Process / Disk size / Security Packages / Network Input / Network Output
https://keymetrics.io/
MIT License
262 stars 39 forks source link

CPU spikes to 100% with installation of pm2-server-monit #35

Closed conatus closed 7 years ago

conatus commented 7 years ago

Hi guys,

Thanks for your wonderful work on this project.

A few days ago we had an outage that was a result of servers spiking to 100% CPU usage and becoming inoperable.

  1. Start two processes, a web process and a application process with pm2
  2. Add pm2-server-monit alongside a Keymetrics key. pm2 install pm2-server-monit && pm2 link XXX YYY $HOSTNAME
  3. CPU spikes to 100%.
  4. Remove pm2-server-monit - pm2 uninstall pm2-server-monit.
  5. CPU returns to usual.

The application does not spike CPU when running with pm2-server-monit, indeed, it barely troubles > 10% CPU usage even for intensive operations. These are AWS m4.large instances.

I am wondering if we can walk through this problem together. The monitoring Keymetrics gives me at the moment (we are a paying customer) is useful and my own mitigation for now has been to remove the installation of pm2-server-monit from our deploy process.

alavit-d commented 7 years ago

Hi,

https://github.com/pm2-hive/pm2-server-monit/commit/070d80f3885684cd5a8a71b3c2c648eba6ab65ee: Changed the CPU retrieval, added a new configuration value called small_interval. Coudl you try v2.2.0 of the module with default configuration? If you still experience issues try with the small_interval value set to 10.

eran10 commented 7 years ago

just had the same issue with small_interval value set to 10 do i had to delete it

karlbecker commented 6 years ago

Hello @alavit-d and anyone listening , I'm also getting this problem, and increasing to 10 or higher is not reducing the CPU usage. Any ideas?

karlbecker commented 6 years ago

Found a stack trace that I referenced in the other Issue, too:

TypeError: Invalid data, chunk must be a string or buffer, not number
    at TLSSocket.Socket.write (net.js:714:11)
    at ClientRequest._flushOutput (_http_outgoing.js:842:18)
    at ClientRequest._flush (_http_outgoing.js:819:16)
    at ClientRequest._deferToConnect (_http_client.js:277:47)
    at callSocketMethod (_http_client.js:690:7)
    at ClientRequest.onSocket (_http_client.js:695:7)
    at Object.onceWrapper (events.js:315:30)
    at emitOne (events.js:121:20)
    at ClientRequest.emit (events.js:211:7)
    at tickOnSocket (_http_client.js:657:7)

Thanks for any help you can provide!

karlbecker commented 6 years ago

Heads up that we are still experiencing this issue, as described here: https://github.com/keymetrics/keymetrics-support/issues/193#issuecomment-395784021