Open gsksivesh opened 5 years ago
Comment by rclough Monday Jun 16, 2014 at 19:41 GMT
When you say the UI, do you mean when you visit the dagobah page in a web browser, it doesnt load? Or that the page loads, but the page doesn't do anything?
Comment by nnfuzzy Friday Jun 27, 2014 at 09:36 GMT
Yes , the first one. But I got no 404 or smth. else. When it occurs next time I'll make screenshot from the page and the process.
Comment by rclough Friday Jun 27, 2014 at 14:43 GMT
It my be useful if you can open the developer tools in whatever browser you have (I know chrome/firefox/safari have similar options) and look at the network tab. That way, when the page fails to load, you can see what network call is failing
Comment by nnfuzzy Friday Jun 27, 2014 at 15:02 GMT
Yes I'll do and try to force getting this event, because sometime it's ok for weeks. One idea is , it has smth. to do with the status job reload (open browser) during a high load on the server?
Comment by nnfuzzy Friday Jul 25, 2014 at 06:45 GMT
Yesterday I had again this issue. I used the network tab in chrome and problem is that flask don't able to response , so no request information. But it's not like the "webserver" is offline.
Comment by thieman Friday Jul 25, 2014 at 12:19 GMT
The proper solution here is probably to serve the app through a legit webserver (probably gunicorn or something) rather than Flask's built-in dev server. The Flask request thread must be dying for some reason and never getting restarted.
Comment by nnfuzzy Monday Jul 28, 2014 at 13:40 GMT
Good point. Perhaps with supervisord incl. it is possible getting more log information...
Comment by hussainsultan Saturday Aug 02, 2014 at 04:26 GMT
I am having the same issue and i am going to try running it with gunicorn and see. Thanks!
Comment by thieman Saturday Aug 02, 2014 at 12:02 GMT
Just make sure you only run 1 process if you run it behind something like gunicorn (which supports multiple app processes). Otherwise you'll also spin up multiple scheduler threads, and you don't want that.
Comment by zhenlongbai Tuesday Apr 21, 2015 at 06:47 GMT
I had the same issue and I run it behind gunicorn . But it did't work.
It's ok for days , but today ,when I added a job ,dagobah_jobs didn't get a an update for next_run. It did't happen everytime , when i add a job .
Comment by thieman Tuesday Apr 21, 2015 at 12:22 GMT
@zhenlongbai Are you able to retrieve the logs from that point? We've added a bunch of logging since this issue was originally reported. Additionally, since you're running into so many issues, it would probably be helpful to set your logging level to debug
in your config file.
Comment by zhenlongbai Wednesday Apr 22, 2015 at 02:21 GMT
Ok , I have used Dogbah on my work,and it run very well for days .The logs had 89350 lines and I will change the logging level to debug to wirite a new log.
I had change some code to make it works well for my job. for example ,utc time and email .
Thanks for you help!
Comment by zhenlongbai Wednesday Apr 22, 2015 at 05:04 GMT
today I had again this issue , when I add a job .
When I click "start job from begin" ,it work once and don't get a an update for next_run automatic。
my start script : nohup gunicorn -b 0.0.0.0:9876 -w 1 dagobah_app:app &
my log :
[2015-04-22 12:46:37 +0000] [16527] [INFO] Worker exiting (pid: 16527)
[2015-04-22 12:46:37 +0000] [16522] [INFO] Handling signal: term
[2015-04-22 12:46:37 +0000] [16522] [INFO] Shutting down: Master
[2015-04-22 12:46:39 +0000] [20901] [INFO] Starting gunicorn 19.3.0
[2015-04-22 12:46:39 +0000] [20901] [INFO] Listening at: http://0.0.0.0:9876 (20901)
[2015-04-22 12:46:39 +0000] [20901] [INFO] Using worker: sync
[2015-04-22 12:46:39 +0000] [20906] [INFO] Booting worker with pid: 20906
/usr/local/lib/python2.7/site-packages/Crypto/Util/number.py:57: PowmInsecureWarning: Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability.
_warn("Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability.", PowmInsecureWarning)
Logging output to /home/brdwork/logs/dagobah.log
Logger initialized at level DEBUG
Package pymongo has version 3.0 which is later than specified version 2.5. If you experience issues, try downgrading to version 2.5.
Starting app on 0.0.0.0:9876
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connected (version 2.0, client OpenSSH_4.3)
Authentication (publickey) successful!
Secsh channel 1 opened.
Exception in thread Thread-3:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/local/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/site-packages/dagobah/core/components.py", line 114, in run
job.start()
File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 387, in start
self.initialize_snapshot()
File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 672, in initialize_snapshot
raise DagobahError(reason)
DagobahError: no independent nodes detected
Comment by zhenlongbai Wednesday Apr 22, 2015 at 05:07 GMT
I can also find the command : [brdwork@recbox04 shell_dagobah]$ ps aux | grep gunicorn brdwork 20901 0.0 0.0 162228 12480 pts/3 S 12:46 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:9876 -w 1 dagobah_app:app brdwork 20906 0.5 0.0 379216 29808 pts/3 Sl 12:46 0:06 /usr/local/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:9876 -w 1 dagobah_app:app brdwork 22295 0.0 0.0 61228 784 pts/4 R+ 13:05 0:00 grep gunicorn [brdwork@recbox04 shell_dagobah]$
Comment by zhenlongbai Wednesday Apr 22, 2015 at 06:31 GMT
This is my DEBUG log. I think ' DEBUG:paramiko.transport:EOF in transport thread ' is the key info. When the thread isn't EOF , dagobah_jobs don't get a an update.
DEBUG:paramiko.transport:starting thread (client mode): 0x5ea7b10L
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_4.3)
DEBUG:paramiko.transport:kex algos:['diffie-hellman-group-exchange-sha1', 'diffie-hellman-group14-sha1', 'diffie-hellman-group1-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', 'rijndael-cbc@lysator.liu.se'] server encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', 'rijndael-cbc@lysator.liu.se'] client mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', 'hmac-ripemd160@openssh.com', 'hmac-sha1-96', 'hmac-md5-96'] server mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', 'hmac-ripemd160@openssh.com', 'hmac-sha1-96', 'hmac-md5-96'] client compress:['none', 'zlib@openssh.com'] server compress:['none', 'zlib@openssh.com'] client lang:[''] server lang:[''] kex follows?False
DEBUG:paramiko.transport:Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
DEBUG:paramiko.transport:using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
DEBUG:paramiko.transport:Switch to new keys ...
DEBUG:paramiko.transport:Trying key a6f65c1f81dafe5b3fb0d897ccf342b2 from /home/brdwork/.ssh/id_rsa
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Authentication (publickey) successful!
DEBUG:paramiko.transport:[chan 1] Max packet in: 34816 bytes
DEBUG:paramiko.transport:[chan 1] Max packet out: 32768 bytes
INFO:paramiko.transport:Secsh channel 1 opened.
DEBUG:paramiko.transport:[chan 1] Sesch channel 1 request ok
DEBUG:paramiko.transport:[chan 1] Sesch channel 1 request ok
DEBUG:paramiko.transport:[chan 1] EOF received (1)
DEBUG:paramiko.transport:[chan 1] EOF sent (1)
DEBUG:paramiko.transport:EOF in transport thread
DEBUG:paramiko.transport:starting thread (client mode): 0x5ea7b90L
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_4.3)
DEBUG:paramiko.transport:kex algos:['diffie-hellman-group-exchange-sha1', 'diffie-hellman-group14-sha1', 'diffie-hellman-group1-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', 'rijndael-cbc@lysator.liu.se'] server encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', 'rijndael-cbc@lysator.liu.se'] client mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', 'hmac-ripemd160@openssh.com', 'hmac-sha1-96', 'hmac-md5-96'] server mac:['hmac-md5', 'hmac-sha1', 'hmac-ripemd160', 'hmac-ripemd160@openssh.com', 'hmac-sha1-96', 'hmac-md5-96'] client compress:['none', 'zlib@openssh.com'] server compress:['none', 'zlib@openssh.com'] client lang:[''] server lang:[''] kex follows?False
DEBUG:paramiko.transport:Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
DEBUG:paramiko.transport:using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
DEBUG:paramiko.transport:Switch to new keys ...
DEBUG:paramiko.transport:Trying key a6f65c1f81dafe5b3fb0d897ccf342b2 from /home/brdwork/.ssh/id_rsa
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Authentication (publickey) successful!
DEBUG:paramiko.transport:[chan 1] Max packet in: 34816 bytes
DEBUG:paramiko.transport:[chan 1] Max packet out: 32768 bytes
INFO:paramiko.transport:Secsh channel 1 opened.
DEBUG:paramiko.transport:[chan 1] Sesch channel 1 request ok
DEBUG:paramiko.transport:[chan 1] Sesch channel 1 request ok
DEBUG:paramiko.transport:[chan 1] EOF received (1)
DEBUG:paramiko.transport:[chan 1] EOF sent (1)
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
DEBUG:paramiko.transport:Sending global request "keepalive@lag.net"
Comment by BruceDone Thursday Dec 29, 2016 at 02:58 GMT
i will try to use the supervisord to see if it will broken again .
update 2016-12-30
my solution is use docker , and use cron to restart it every hour , then currently it works well ,but should find the deep reason why the ui broken.
Issue by nnfuzzy Friday Jun 13, 2014 at 08:50 GMT Originally opened as https://github.com/thieman/dagobah/issues/100
Hi,
sometimes (actually more often) I can't reach the UI anymore. My suspicion is a peak in load on the server which broke flask UI. In the log I found only the last 200's.
INFO:werkzeug:... - - [13/Jun/2014 08:37:17] "GET /api/job?jobname=DMProcessing HTTP/1.1" 200 - INFO:werkzeug:..._ - - [13/Jun/2014 08:37:19] "GET /api/job?jobname=DMProcessing HTTP/1.1" 200 - INFO:werkzeug:..._ - - [13/Jun/2014 08:37:20] "GET /api/job?jobname=DMProcessing HTTP/1.1" 200 - INFO:werkzeug:..._ - - [13/Jun/2014 08:37:22] "GET /api/job?jobname=DMProcessing HTTP/1.1" 200 - INFO:werkzeug:..._ - - [13/Jun/2014 08:37:23] "GET /api/job?job_name=DMProcessing HTTP/1.1" 200 -
I use mongodb backend and dagobah collections are in a separate db.
Many thanks for a hint Christian