Closed KalimeroJr closed 8 years ago
This will typically happen if the worker process isn't running. Basically the web interface puts jobs in a rabbit queue, and the worker process consumes jobs from that queue.
If you're using the phamdb docker image, the worker process should launch automatically when the container starts. How did you install PhamDB?
That's what I thought.
I installed it through docker kitematic.. It's the only container running. Can check for a certain process if you want
Output that I can see through Kitematic:
016-05-02 07:34:06,444 CRIT Supervisor running as root (no user in config file)
2016-05-02 07:34:06,445 WARN Included extra file "/etc/supervisor/conf.d/supervisor-app.conf" during parsing
Unlinking stale socket /var/run/supervisor.sock
2016-05-02 07:34:06,782 INFO RPC interface 'supervisor' initialized
2016-05-02 07:34:06,782 WARN cElementTree not installed, using slower XML parser for XML-RPC
2016-05-02 07:34:06,783 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-05-02 07:34:06,784 INFO supervisord started with pid 1
2016-05-02 07:34:07,786 INFO spawned: 'celery-app' with pid 5
2016-05-02 07:34:07,790 INFO spawned: 'nginx-app' with pid 6
2016-05-02 07:34:07,799 INFO spawned: 'rabbitmq' with pid 7
2016-05-02 07:34:07,801 INFO spawned: 'set-secret-key' with pid 8
2016-05-02 07:34:07,811 INFO spawned: 'app-uwsgi' with pid 9
2016-05-02 07:34:07,817 INFO spawned: 'start-mysql' with pid 10
2016-05-02 07:34:07,897 INFO exited: set-secret-key (exit status 0; not expected)
2016-05-02 07:34:08,166 CRIT reaped unknown pid 74)
2016-05-02 07:34:09,167 INFO success: celery-app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-05-02 07:34:09,168 INFO success: nginx-app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-05-02 07:34:09,168 INFO success: rabbitmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-05-02 07:34:09,170 INFO spawned: 'set-secret-key' with pid 325
2016-05-02 07:34:09,171 INFO success: app-uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-05-02 07:34:09,171 INFO success: start-mysql entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-05-02 07:34:09,201 INFO exited: set-secret-key (exit status 0; not expected)
2016-05-02 07:34:09,308 CRIT reaped unknown pid 367)
2016-05-02 07:34:09,851 CRIT reaped unknown pid 452)
2016-05-02 07:34:09,869 CRIT reaped unknown pid 411)
2016-05-02 07:34:09,876 CRIT reaped unknown pid 460)
2016-05-02 07:34:09,969 CRIT reaped unknown pid 482)
2016-05-02 07:34:09,979 CRIT reaped unknown pid 483)
2016-05-02 07:34:10,179 CRIT reaped unknown pid 500)
2016-05-02 07:34:10,180 CRIT reaped unknown pid 501)
2016-05-02 07:34:10,529 CRIT reaped unknown pid 515)
2016-05-02 07:34:11,534 INFO spawned: 'set-secret-key' with pid 546
2016-05-02 07:34:11,575 INFO exited: set-secret-key (exit status 0; not expected)
2016-05-02 07:34:13,136 CRIT reaped unknown pid 648)
2016-05-02 07:34:13,161 CRIT reaped unknown pid 549)
2016-05-02 07:34:14,445 INFO exited: start-mysql (exit status 0; expected)
2016-05-02 07:34:14,630 INFO spawned: 'set-secret-key' with pid 657
2016-05-02 07:34:14,630 CRIT reaped unknown pid 656)
2016-05-02 07:34:14,654 INFO exited: set-secret-key (exit status 0; not expected)
2016-05-02 07:34:14,691 INFO gave up: set-secret-key entered FATAL state, too many start retries too quickly
2016-05-02 07:34:14,692 CRIT reaped unknown pid 655)
2016-05-02 07:34:14,765 CRIT reaped unknown pid 668)
2016-05-02 07:34:14,769 CRIT reaped unknown pid 669)
2016-05-02 07:34:15,238 CRIT reaped unknown pid 702)
2016-05-02 07:34:16,149 CRIT reaped unknown pid 520)
2016-05-02 07:34:19,114 CRIT reaped unknown pid 519)
2016-05-02 07:34:19,133 INFO exited: rabbitmq (exit status 0; expected)
I also checked the DB, and the webphamerator db contains what looks to be a valid job in the job table:
mysql> select * from job;
+----+-------------+---------------+-------------+-----------------+---------------------+------------+---------+------+---------+-----------+
| id | database_id | database_name | status_code | status_message | modified | start_time | runtime | seen | task_id | type_code |
+----+-------------+---------------+-------------+-----------------+---------------------+------------+---------+------+---------+-----------+
| 1 | 1 | PA-phages | queued | Waiting to run. | 2016-04-27 11:33:52 | NULL | NULL | 1 | NULL | NULL |
+----+-------------+---------------+-------------+-----------------+---------------------+------------+---------+------+---------+-----------+
1 row in set (0.00 sec)
Worker seems to be up:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 12:01 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -n
root 5 1 0 12:01 ? 00:00:00 /usr/bin/python /usr/local/bin/celery -A webphamerator.app.celery worker
root 6 1 0 12:01 ? 00:00:00 nginx: master process /usr/sbin/nginx
root 10 1 0 12:01 ? 00:00:00 /usr/local/bin/uwsgi --ini /home/docker/code/docker/uwsgi.ini
www-data 22 6 0 12:01 ? 00:00:00 nginx: worker process
www-data 23 6 0 12:01 ? 00:00:00 nginx: worker process
www-data 24 6 0 12:01 ? 00:00:00 nginx: worker process
www-data 25 6 0 12:01 ? 00:00:00 nginx: worker process
rabbitmq 65 1 0 12:01 ? 00:00:00 /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
root 79 1 0 12:01 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe
root 423 10 0 12:01 ? 00:00:00 /usr/local/bin/uwsgi --ini /home/docker/code/docker/uwsgi.ini
root 424 10 0 12:01 ? 00:00:00 /usr/local/bin/uwsgi --ini /home/docker/code/docker/uwsgi.ini
root 431 5 0 12:01 ? 00:00:00 /usr/bin/python /usr/local/bin/celery -A webphamerator.app.celery worker
mysql 440 79 0 12:01 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/dockerdata/mysql --plugin-dir=/usr/lib/mysql/plugin --user=
root 441 79 0 12:01 ? 00:00:00 logger -t mysqld -p daemon.error
rabbitmq 448 1 0 12:01 ? 00:00:00 /bin/sh /usr/sbin/rabbitmq-server
rabbitmq 487 448 0 12:01 ? 00:00:00 /usr/lib/erlang/erts-5.9.1/bin/beam -W w -K true -A30 -P 1048576 -- -root /usr/lib/erlang -progname er
rabbitmq 706 487 0 12:01 ? 00:00:00 inet_gethost 4
rabbitmq 707 706 0 12:01 ? 00:00:00 inet_gethost 4
root 717 0 0 12:02 ? 00:00:00 sh
root 723 717 0 12:02 ? 00:00:00 mysql
root 757 0 0 12:04 ? 00:00:00 sh
root 763 757 0 12:05 ? 00:00:00 ps -ef
Thanks, this is helpful. I'll look into it tonight and see if I can reproduce the issue.
If I remember correctly, the job table is just used to store metadata. It's possible for it to get out of sync with the actual rabbit queue.
My guess is your rabbit queue is empty. I'd suggest manually deleting that row from the Job database. I'm not sure what the database password is, you might want to check the Dockerfile to see how it was set up.
Regarding the rabbit queue, you are correct:
# rabbitmqctl list_queues
Listing queues ...
celery 0
...done.
I'll delete the entry from the Job table and try again. Looks as if it succeeds in populating the Job table but either it doesn't successfully send the message through the queue or the celery worker accepts the message (and so, empties the queue) but fails during creation of the database..?
OK, found the culprit. It's all because of a low memory alarm set by rabbitmq.
Noticed this in the rabbitmq log:
=INFO REPORT==== 2-May-2016::16:01:39 ===
Memory limit set to 398MB of 996MB total.
=INFO REPORT==== 2-May-2016::16:01:39 ===
vm_memory_high_watermark set. Memory used:595445168 allowed:417785446
=INFO REPORT==== 2-May-2016::16:01:39 ===
alarm_handler: {set,{{resource_limit,memory,rabbit@e6ca7f886fe7},[]}}
=INFO REPORT==== 2-May-2016::16:01:39 ===
Disk free limit set to 953MB
The memory alarm caused even the "Hello World" example of rabbitMQ to fail.
By temporarily raising the threshold to 0.7 of total memory :
rabbitmqctl set_vm_memory_high_watermark 0.7
The queue was freed and now phamDB creates databases successfully.
I'm not sure why the image has such high system memory usage by rabbitmq:
# rabbitmqctl status
Status of node rabbit@e6ca7f886fe7 ...
[{pid,4068},
{running_applications,[{rabbit,"RabbitMQ","2.8.4"},
{mnesia,"MNESIA CXC 138 12","4.7"},
{os_mon,"CPO CXC 138 46","2.2.9"},
{sasl,"SASL CXC 138 11","2.2.1"},
{stdlib,"ERTS CXC 138 10","1.18.1"},
{kernel,"ERTS CXC 138 10","2.15.1"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R15B01 (erts-5.9.1) [source] [64-bit] [async-threads:30] [kernel-poll:true]\n"},
{memory,[{total,596007536},
{processes,10418222},
{processes_used,10418208},
{system,585589314},
{atom,504409},
{atom_used,473967},
{binary,103168},
{code,11874771},
{ets,754232}]},
{vm_memory_high_watermark,0.6999999998085141},
{vm_memory_limit,731124531},
{disk_free_limit,1000000000},
{disk_free,12889649152},
{file_descriptors,[{total_limit,1048476},
{total_used,7},
{sockets_limit,943626},
{sockets_used,4}]},
{processes,[{limit,1048576},{used,154}]},
{run_queue,0},
{uptime,512}]
...done.
Anyway - maybe adding a test of the rabbitmq server could be beneficial, or changing the memory settings of it's memory usage/thresholds...
Thanks for digging into this more deeply, I'm glad you figured it out!
I agree that we should try to prevent this by changing the rabbitmq config and also marking jobs as "failed" if the queue isn't working.
I don't have time to actively develop PhamDB anymore, but if you want to volunteer to submit a pull request, I'd be happy to walk you though the process. Either way, thanks for submitting such a detailed bug report. If anyone wants to tackle it, there should be enough information to reproduce it.
It seems time is a resource we are both low on :)
I'll just leave this here and close the "issue" for now..
Hello, I understand that this issue has been closed. Unfortunately I encountered the same problem two years later. As I am quite new to PhamDB and rabbitmq, could you kindly tell me how I can view the rabbitmq log please? I would like to know if I have the similar "low memory alarm set by rabbitmq".
Sorry for two more simple questions: 1: where can I assess this "mysql> select * from job"? 2: How can I temporarily raise the threshold to 0.7 of total memory : rabbitmqctl set_vm_memory_high_watermark 0.7
Is it manipulated in the VirtualBox terminal, Docker Toolbox or Kitematic?
Many thanks for your time and kind response.
Awesome work on this!
Unfortunately, after going through the create database (made from uploaded validated phage genbanks) step, jobs are waiting in queue forever. Is there a way to see what it is waiting for or a way to manually run the jobs?