oveits / ProvisioningEngine

Ruby on Rails based ProvisioningEngine Frontend for provisioning of legacy systems via Apache Camel Backend (SOAP/XML+SPML+File import)
3 stars 6 forks source link

High CPU for ruby process: rake jobs:work #41

Closed oveits closed 8 years ago

oveits commented 8 years ago

On the pre-production system, we see: image

and:

$ ps -ef | grep ruby
provisi+  6295     1  0 Feb14 ?        00:05:56 /home/provisioningengine/.rbenv/versions/2.2.4/bin/ruby bin/rails s -p 3000
provisi+  6359  6335 92 Feb14 ?        10-20:11:02 ruby /home/provisioningengine/.rbenv/versions/2.2.4/bin/rake jobs:work
provisi+ 14447     1  0 Feb12 ?        00:27:21 /home/provisioningengine/.rbenv/versions/2.2.4/bin/ruby bin/rails s -p 3001
provisi+ 23823 23654  0 10:34 pts/0    00:00:00 grep --color=auto ruby

It seems like the rake jobs:work process is consuming 100% CPU, although it does not seem to do anything:

provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ tail -f /var/log/WebPortal.log
Started GET "/dev/assets/active_admin/ext/jquery-ui.js?body=1" for 127.0.0.1 at 2016-02-21 21:48:49 +0100

Started GET "/dev/assets/active_admin/application.js?body=1" for 127.0.0.1 at 2016-02-21 21:48:49 +0100

Started GET "/dev/assets/active_admin.js?body=1" for 127.0.0.1 at 2016-02-21 21:48:49 +0100

Started GET "/dev/assets/active_admin/print.css?body=1" for 127.0.0.1 at 2016-02-21 21:48:49 +0100
^C
provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$

The log file is silent for several days, although the script

# file: startDelayedJobs.sh
...
./status | grep "Delayed Jobs: running" || ~/.rbenv/shims/rake jobs:work >> /var/log/WebPortal.log 2>&1 &

is redirecting all rake jobs:work output to the file. Also there are no current jobs in the queue: image

oveits commented 8 years ago

The rake process cannot easily be stopped/killed:

provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ ./stopDelayedJobs.sh
myDelayedJobsPid=6359
kill signal sent for Delayed Jobs (PID=6359)
provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ ./stopDelayedJobs.sh
myDelayedJobsPid=6359

It works with kill -9, though:

provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ kill -9 6359
provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ kill -9 6359
-bash: kill: (6359) - No such process
oveits commented 8 years ago

After starting the rake jobs:work job again, everything seems to work again:

provisioningengine@ProvisioningEngine:~/ProvisioningEnginev0.5.20_testfolder$ ./startDelayedJobs.sh
Delayed Jobs started: PID=24406

and CPU is normal again: image

However, it still takes a long time until the job is processed: a synchronizeSynchronously is now hanging in Pending for an hour: image

Not good. :-(

oveits commented 8 years ago

The problem seems to be known. See https://github.com/collectiveidea/delayed_job/issues/823, which points to https://github.com/collectiveidea/delayed_job/issues/776.

The workaround is supposed to be

config.cache_classes = true

Let us try that out.

oveits commented 8 years ago
$ grep -r cache_classes config/environments/
config/environments/test.rb:  config.cache_classes = true
config/environments/development.rb:  config.cache_classes = false
config/environments/production.rb:  config.cache_classes = true

TODO:

1) change config.cache_classes = true in config/environments/development.rb and restart the server 2) consider to re-try to run the server in production mode. I had done so, but had encountered some css stylesheet issues. But it is a good idea to run the productive machine (I always used to call it 'pre-production machine', since it is not a product yet) in production mode.

oveits commented 8 years ago

TODO 1) done:

Login to CSL9_ProvisioningEngine_Ruby.

Before changes:

$ ps -p 30019 -o %cpu,%mem,cmd
%CPU %MEM CMD
81.1 69.8 ruby /home/provisioningengine/.rbenv/versions/2.2.4/bin/rake jobs:work

Then:

$ ./start
Web Portal started: PID=17425
Delayed Jobs started: PID=17489

Finally checking CPU+MEM consumption:

$ ps -p 17489 -o %cpu,%mem,cmd
%CPU %MEM CMD
10.2  2.7 ruby /home/provisioningengine/.rbenv/versions/2.2.4/bin/rake jobs:work

and some minutes later:

$ ps -p 17489 -o %cpu,%mem,cmd
%CPU %MEM CMD
 0.7  2.8 ruby /home/provisioningengine/.rbenv/versions/2.2.4/bin/rake jobs:work

Now I need to re-check CPU+Memory after some hours/days...