Closed joshuapaling closed 9 years ago
Does capistrano have permission to stop the workers ? How are you restarting the workers on deployment ? What's the capistrano output when running the command restarting the workers ?
Hi, thanks for the quick response.
Yes, capistrano does have permission.
I've only got one worker (default) and one queue (defaut) going at a time. Initially tried restarting with execute "#{release_path}/app/Console/cake CakeResque.CakeResque restart "
But now I'm instead doing it separately:
execute "#{release_path}/app/Console/cake CakeResque.CakeResque stop"
execute "#{release_path}/app/Console/cake CakeResque.CakeResque start"
I've been trying to see how tied the problem is to Capistrano. I did the following:
./cake CakeResque.CakeResque stats
.Workers Stats
Workers count : 1
REGULAR WORKERS
* sabre740.anchor.net.au:20918:default
- Started on : Tue Oct 14 14:18:43 EST 2014
- Processed Jobs : 0
- Failed Jobs : 0
bundle exec cap staging deploy
. It works, and adds a new release. I temporarily disabled any starting / stopping of workers during the Capistrano deploy.Workers Stats
Workers count : 1
REGULAR WORKERS
* sabre740.anchor.net.au:20918:default
- Started on : Tue Oct 14 14:18:43 EST 2014
- Processed Jobs : 0
- Failed Jobs : 0
and can also not stop the worker:
staging@sabre740:~/public_html$ ./current/app/Console/cake CakeResque.CakeResque stop
Stopping workers
There is no workers to stop ...
Although it is still running:
staging@sabre740:~/public_html$ ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
staging 8685 0.0 0.0 73312 1664 ? S 13:09 0:00 sshd: staging@pts/1
staging 8686 0.0 0.1 20696 3612 pts/1 Ss 13:09 0:00 -bash
staging 16106 0.0 0.7 161236 15348 ? S 13:49 0:00 php ./bin/resque
staging 16485 0.0 0.0 73312 1660 ? S 11:07 0:00 sshd: staging@pts/6
staging 16486 0.0 0.1 20704 3628 pts/6 Ss 11:07 0:00 -bash
staging 19400 0.0 0.7 161236 15348 ? S 14:01 0:00 php ./bin/resque
staging 20695 0.0 0.0 19224 1112 pts/1 S+ 14:16 0:00 redis-cli
staging 20917 0.0 0.0 10780 1416 pts/6 S 14:18 0:00 bash -c cd '/home/staging/public_html/releases/20141014030140/app/Vendor/kamisama/php-resque-ex'; VERBOSE=true QUEUE='default' PIDFILE='/home/stag
staging 20918 0.0 0.7 161236 15168 pts/6 S 14:18 0:00 php ./bin/resque
staging 22832 0.0 0.0 16836 1268 pts/6 R+ 14:32 0:00 ps ux
I've been battling with this for a few days now. I'm tempted to just kill all processes with 'resque' in the name on each deploy. Though I know that's a terrible solution.
Here's a bunch of info about one of my workers:
cd '/home/staging/public_html/releases/20141014024927/app/Vendor/kamisama/php-resque-ex'; VERBOSE=true QUEUE='default'
PIDFILE='/home/staging/public_html/releases/20141014024927/app/Plugin/CakeResque/tmp/14132549791315'
APP_INCLUDE='/home/staging/public_html/releases/20141014024927/app/Plugin/CakeResque/Lib/CakeResqueBootstrap.php'
RESQUE_PHP='/home/staging/public_html/releases/20141014024927/app/Vendor/kamisama/php-resque-ex/lib/Resque.php' INTERVAL=5 REDIS_BACKEND='localhost:6379' REDIS_DATABASE=1 REDIS_NAMESPACE='resque' REDIS_PASSWORD=''
CAKE='/home/staging/public_html/releases/20141014024927/lib/Cake/'
APP='/home/staging/public_html/releases/20141014024927/app/' COUNT=1 LOGHANDLER='RotatingFile'
LOGHANDLERTARGET='/home/staging/public_html/releases/20141014024927/app/tmp/logs/resque.log' php './bin/resque'
>> '/home/staging/public_html/releases/20141014024927/app/tmp/logs/resque-worker-error.log' 2>&1
Is it normal to have so many references to the specific release dir (in this case /releases/20141014024927
)? I suspect that's part of the issue.
PS, I ssh'd in, and tried finding the path of a resque worker that's definitely running with ps ux
, and then executing the stats command on the full path of that worker - but it still gave no results:
staging@sabre740:~/public_html$ /home/staging/public_html/releases/20141014030140/app/Console/cake CakeResque.CakeResque stats
Resque Statistics
---------------------------------------------------------------
Jobs Stats
Processed Jobs : 0
Failed Jobs : 0
Queues Stats
Queues count : 0
Workers Stats
Workers count : 0
So it turns out when I try to clear Cake's default cache, it's also clearing a bunch of resque-related keys. I'm using Redis for Cake's caching - I'm going to switch back to the file cache, and I think that should resolve the issue. Thanks for your help.
UPDATE: There was no need to switch from Redis for caching. I was missing a 'prefix' option in my default cache config - and that means that when you try to clear that cache, redis clears ALL keys (it will try to clear all keys matching the prefix - and when that's blank, it'll clear all keys).
Really, if you're using Redis for Cake cache and Resque, you should probably use separate redis databases for each.
You can change Cake-Resque's redis database and prefix in the plugin config file.
And the database clearing when there is no prefix seems seems dangerous. Maybe you can try opening a ticket on the cake repo, and ask to add a check to prevent this.
Done already - https://github.com/cakephp/cakephp/issues/4876
Thanks again for the response, and for this plugin.
I'm using Capistrano v3 for deployment, along with Cake-Resque.
Resque is not correctly finding existing workers. For example, see the below terminal output (note the
ps ux
shows that a resque worker is running, but the stats indicate none exist):This is meaning that each time I deploy a new release, resque won't stop old workers, but it will start a new one. So I end up with an increasing number of workers hanging around, and each worker is associated with a different one of Capistrano's release paths. This is causing various issues and sometimes causing white screen of death in my app, because when an old worker runs from an old release, Cake will still cache file paths from the old release and you get a situation where Cake is trying to run with some files from the current release, and some files from previous releases.