pranavgupta1234 / rqmonitor

Flask based more dynamic and actionable frontend dashboard for monitoring Redis Queue 👩🏿‍💻 http://python-rq.org
Apache License 2.0
186 stars 24 forks source link

Exception displayed when navigating to the "Jobs" page #11

Closed waldner closed 3 years ago

waldner commented 3 years ago

Using the latest docker image, when navigating to the "Jobs" page I get (in the browser, not in the log):

                  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/rqmonitor/decorators.py", line 32, in _wrapper
    raise RQMonitorException(**kwargs).with_traceback(tb)
  File "/rqmonitor/decorators.py", line 22, in _wrapper
    inner_response = func(*args, **kwargs)
  File "/rqmonitor/decorators.py", line 11, in _wrapper
    _rendered_template = func(*args, **kwargs)
  File "/rqmonitor/bp.py", line 223, in list_jobs_api
    jobs = resolve_jobs(job_counts, start, length)
  File "/rqmonitor/utils.py", line 495, in resolve_jobs
    current_block_jobs = list_jobs_in_queue_registry(block[0], block[1], cursor,
  File "/rqmonitor/utils.py", line 280, in list_jobs_in_queue_registry
    return [Job.fetch(job_id) for job_id in job_ids]
  File "/rqmonitor/utils.py", line 280, in <listcomp>
    return [Job.fetch(job_id) for job_id in job_ids]
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 299, in fetch
    job.refresh()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 505, in refresh
    raise NoSuchJobError('No such job: {0}'.format(self.key))
methusalan commented 3 years ago

I have a very similar error as well, popping up when i try to see the scheduled jobs. ( if i filter it out it loads fine)

 File "/opt/venv/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/venv/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/decorators.py", line 32, in _wrapper
    raise RQMonitorException(**kwargs).with_traceback(tb)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/decorators.py", line 22, in _wrapper
    inner_response = func(*args, **kwargs)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/decorators.py", line 11, in _wrapper
    _rendered_template = func(*args, **kwargs)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/bp.py", line 223, in list_jobs_api
    jobs = resolve_jobs(job_counts, start, length)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/utils.py", line 496, in resolve_jobs
    cursor+length-1)
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/utils.py", line 286, in list_jobs_in_queue_registry
    return [Job.fetch(job_id) for job_id in job_ids]
  File "/opt/venv/lib/python3.6/site-packages/rqmonitor/utils.py", line 286, in <listcomp>
    return [Job.fetch(job_id) for job_id in job_ids]
  File "/opt/venv/lib/python3.6/site-packages/rq/job.py", line 299, in fetch
    job.refresh()
  File "/opt/venv/lib/python3.6/site-packages/rq/job.py", line 518, in refresh
    raise NoSuchJobError('No such job: {0}'.format(self.key))
waldner commented 3 years ago

Another one, when going to the "jobs" page and there are active jobs:

                  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/rqmonitor/decorators.py", line 32, in _wrapper
    raise RQMonitorException(**kwargs).with_traceback(tb)
  File "/rqmonitor/decorators.py", line 22, in _wrapper
    inner_response = func(*args, **kwargs)
  File "/rqmonitor/decorators.py", line 11, in _wrapper
    _rendered_template = func(*args, **kwargs)
  File "/rqmonitor/bp.py", line 223, in list_jobs_api
    jobs = resolve_jobs(job_counts, start, length)
  File "/rqmonitor/utils.py", line 495, in resolve_jobs
    current_block_jobs = list_jobs_in_queue_registry(block[0], block[1], cursor,
  File "/rqmonitor/utils.py", line 289, in list_jobs_in_queue_registry
    return queue.get_jobs(start, end-start+1)
  File "/usr/local/lib/python3.8/site-packages/rq/queue.py", line 167, in get_jobs
    return compact([self.fetch_job(job_id) for job_id in job_ids])
  File "/usr/local/lib/python3.8/site-packages/rq/queue.py", line 167, in <listcomp>
    return compact([self.fetch_job(job_id) for job_id in job_ids])
  File "/usr/local/lib/python3.8/site-packages/rq/queue.py", line 147, in fetch_job
    job = self.job_class.fetch(job_id, connection=self.connection)
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 299, in fetch
    job.refresh()
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 506, in refresh
    self.restore(data)
  File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 474, in restore
    self.started_at = str_to_date(obj.get('started_at'))
  File "/usr/local/lib/python3.8/site-packages/rq/utils.py", line 256, in str_to_date
    return utcparse(as_text(date_str))
  File "/usr/local/lib/python3.8/site-packages/rq/utils.py", line 172, in utcparse
    return datetime.datetime.strptime(string, '%Y-%m-%dT%H:%M:%SZ')
  File "/usr/local/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
pranavgupta1234 commented 3 years ago

Hi @waldner Which version of rq are you using, I think the time parse error should not occur on rq>=1.4.0.I checked that I have updated version in setup.py but requirements.txt still contains 1.2.0, I will update it soon. Please let me know if its reproducable on >=1.4.0.

@waldner @methusalan rq seems to raise NoSuchJobError in case Job is not present with provided ID and it also does not guard against passing None and raises the same error. @methusalan thank you for sharing the issue context being around scheduled jobs. I will try to reproduce and debug more into it soon. @waldner in case you can share the context in which you received NoSuchJobError then it would be more helpful in debugging.

waldner commented 3 years ago

@pranavgupta1234 I'm running rq 1.5.2.

$ rq --version
rq, version 1.5.2
loicteixeira commented 3 years ago

I can confirm that the time parse issue is fixed when upgrading RQ to >1.4, however the pranavgupta1234/rqmonitor docker image ships with rq==1.2 so even if your project has an updated version of RQ, RQ Monitor does not.

pranavgupta1234 commented 3 years ago

Sorry guys, was bit occupied from some time. I have updated the requirements.txt and pushed new docker image as well. Image tagged with latest or v1.0.1 should get this fixed now. Feel free to reopen if issue still persists, closing for now.

waldner commented 3 years ago

No more exceptions now, however both the "workers" page and the "jobs" page display the "Loading..." box forever and nothing appears.

platipusica commented 3 years ago

Hi @waldner. this is not my experience with: rq, version 1.9.0

and Py38.

Cheers

waldner commented 3 years ago

I'm using the docker image pranavgupta1234/rqmonitor:latest. On the workers I also use RQ 1.9.0; python is version 3.9.

EDIT: even rebuilding the docker image with rq 1.9.0 and redis 3.5.0 (needed as a dependency requirement), the result is exactly the same.

waldner commented 3 years ago

I can see that while the "Loading..." box is shown, rqmonitor runs the following HTTP request:

GET /workers?redis_instance_index=0&_=1625774669388

And it remains pending. After a few seconds, not getting an answer, it requests

GET /static/nunjucks/error.html?_=1625774669389

which returns an error template (with unresolved variables, not shown to the user):

{{ errorThrown }}
{{ error_info.message }}

                {{ error_info.traceback }}

Submit Issue

The above two requests repeat forever (with updated timestamps), and the first of the two always remains hanging. This is for the "workers" page. NOTE: I have about 200 workers overall.

For the "Jobs" page, rqmonitor does the following request:

GET /jobs?draw=1&columns[0][data]=job_info&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=true&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=action&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=true&columns[1][search][value]=&columns[1][search][regex]=false&order[0][column]=0&order[0][dir]=asc&start=0&length=50&search[value]=&search[regex]=false&redis_instance_index=0&queues[]=default,normal,high&jobstatus[]=queued,finished,failed,started,deferred,scheduled&_=1625774669782

which DOES indeed get a JSON answer, but for some reason it's not displayed. One thing I noticed is that the reply takes some seconds to come back, so when it finally arrives rqmonitor has already sent other requests like the above one and they're still outstanding (don't know whether it matters, but just in case). Indeed if I restrict the set of values queried for (for example, only one queue, or only one job status) replies come earlier (before rqmonitor attempts to do another request) and jobs are finally displayed.

Hope the above helps.

platipusica commented 3 years ago

Huh, 200 workers? Maybe some commercial solution is needed for that ;) Good luck.

pranavgupta1234 commented 3 years ago

I don't think the amount of workers are going to be an issue. I am wondering what is causing issue for workers endpoint, @waldner is your redis on your localhost or some remote server ? redis-cli --latency -h <redis_host> -p 6379 should give you approx latency times for your redis server, wondering if there are some timeouts occurring for ajax requests.

waldner commented 3 years ago

@pranavgupta1234 yes my redis is remote and might take some time to answer (and the server itself is not super fast, so even redis commands may take some time, especially if much data has to be retrieved), but I don't think it should affect the outcome of rq-monitor. Here's the output of redis-cli --latency:

min: 20, max: 25, avg: 20.93 (533 samples)

In any case, IMHO rq-monitor should wait for a previous reply to an http request to arrive before doing another one.

EDIT: I've now rerun the workers query and I see that it takes 11 seconds to return the answer, so it's probably way too much for rqmonitor's timeouts.

If I can make a suggestion, it would be nice to have an option in the web UI to configure the polling frequency and another one to turn off polling. Thanks!

pranavgupta1234 commented 3 years ago

By running worker's query you mean like checking worker info ? because above latency numbers are normal for remote redis, what I am thinking is what is causing this 11s delay?

waldner commented 3 years ago

I mean the query that rqmonitor runs, eg GET /workers?redis_instance_index=0&_=1625774669388 As I said, the server is not very fast. In any case, I wouldn't expect rqmonitor to immediately fire 4 or 5 requests while it's waiting for the response to the first one. What I would expect is just that it takes 11 seconds (or however much) of "Loading..." before I see the first version of the workers page, and from there I should be able to tweak or disable the polling (or do nothing, provided that rqmonitor only polls again when it gets the reply to the previous request). another option could be to enable the user to configure the polling frequency via an environment variable or other means.