Open davecap opened 11 years ago
I've never written a Heroku application before so this will be hard for me to debug. Does explicitly closing the connection on your queued function fix it? I did a little bit of googling and it seems that this problem is not specific to Django/RQ. There's a resque thread discussing the same problem https://github.com/defunkt/resque/issues/367
Yea I saw that thread too... I'm surprised I haven't seen any traces of this problem with Django though. I was kind-of wondering if I could implement a similar fix to the gem mentioned in that thread. What they're doing is adding the disconnect on some sort of "after_job" callback. Does such a thing exist with RQ?
No, there's no such thing as "after_job" in RQ. We can propose adding one, as well as the ability to use a different worker class since this is already in RQ's roadmap (implementing concurrent workers).
However, we need to first ensure that the proposed solution does indeed solve the problem. Can you try if running "close_connection" in the function being queued fix this?
Yep. I just tested it one of my async functions and I don't see the EOF errors.
I've been some more thinking about this, here's a few ideas:
Writing a wrapper function for async functions - this doesn't require any changes to RQ but could be tedious if every single enqueued django function on Heroku needs to explicitly close db connection.
from django.db import connection
def func():
#pass
def rq_func():
func()
connection.close()
queue.enqueue(rq_func)
Implementing a job callback pattern, however, this idea was rejected in https://github.com/nvie/rq/issues/155, I'm also not a big fan of the notion of having callbacks in RQ.
queue.enqueue(f, callback=close_connection)
Add a job_class
argument to Worker
to allow the use of custom job classes. In this case, Django RQ can define its own Job
class to deal with this issue. It would look something like:
from rq.job import Job
from django.db import connection
ConnectionClosingJob(Job):
def perform(self):
result = super(DatabaseClosingJob, self).perform()
connection.close()
return result
# Somewhere in Django-RQ
worker = Worker(connection=connection, job_class=ConnectionClosingJob)
@nvie, what do you think?
I'm experiencing the same issue. I'm glad I checked, because I had no idea what the cause issue was. I think alternatives 2 or 3 would be good. I would think of it less of a callback than as a hook. I would ideally like to do other post-job housekeeping.
Just to add, I'm not sure Alternative 1 really works when func() is a bound method. In my use case, I'm binding a method at run time and queuing that method directly. I'm refactoring to try to pass a wrapper function instead that will recover the method reference job-side and do the necessary cleanup, but it's really gumming up the code.
I don't understand the problem in the first place. Stripping away django-rq and RQ from the equation, what's the core of the problem we're looking at here? Isn't it that Django implicitly opens a connection, but does never explicitly close it until its Python process is terminated (which never happens in an RQ worker situation)?
From what I understand it's a harmless postgres error message that occurs when no Terminate message is received. I wonder if enabling "autocommit" mode for the psycopg2 DB backend would fix it.
@nvie That's one manifestation of the problem, but to me, the overall problem is that I want to be able to do other post-task work. In my case, I'm manually tracking tasks in flight so that I can prevent duplicate tasks with the same arguments from running. In the current paradigm, I have to schedule a wrapper function that calls that actual task function performs this clean up. But since there are multiple actual task functions that might be called, and since I can't pass function references as arguments, my wrapper function has to do some serious gymnastics to recreate the task function reference worker-side. The whole thing would be much easier if there were hooks for post-task work. I really like Alternative 3, personally.
The problem I believe is that RQ forks its worker processes from the main process, and django shares its database connection from the parent process to the child process. http://stackoverflow.com/questions/8242837/django-multiprocessing-and-database-connections sums up the problem pretty well.
We have decided to implement the fix on RQ itself instead of django-rq. RQ will allow the user to use a custom Worker
class where we can then tell the worker to do close db connection before forking.
There's an open issue tracking this feature on https://github.com/nvie/rq/issues/233 . Feel free to contribute/chime in on the discussion on the thread.
I came here because I was worried with the could not receive data from client: Connection reset by peer
on my django/django-rq app running on Heroku.
I applied the tip described here to create "clean" connections when the worker is launched and avoir the "forked connections" syndrome that seems to be the root cause: https://github.com/nvie/rq/issues/233#issuecomment-25010141
But I stil get reset connection message on Heroku. Did I miss something?
@abulte mind showing me a snippet of your code that manages postgres connection? If I remember correctly @fdr also wanted to put this up on Heroku's docs so we should definitely work out a reliable way of doing this and properly document it.
Here's my snippet, directly inspired from the mentioned link:
from rq import Worker
from django.db import close_connection
class AAWorker(Worker):
def fork_and_perform_job(self, *args, **kwargs):
close_connection()
super(AAWorker, self).fork_and_perform_job(*args, **kwargs)
I should add that it did not solve the "connection reset" problem.
I would be very glad to have the heroku docs for doing that correctly ;-)
Thanks for your help.
@abulte THANK YOU dude, this just saved me so much time and effort.
EDIT: Sorry, scratch that response, this didn't work for me, I think I have a different bug :/ Still thank you though lol.
I went ahead and updated the Heroku docs like I promised I would. Somehow I missed the notification the first time around. Sorry for the long delay:
https://devcenter.heroku.com/articles/forked-pg-connections#django-rq
@fdr thanks!
Sent from my phone
On Sep 5, 2014, at 6:42 AM, Daniel Farina notifications@github.com wrote:
I went ahead and updated the Heroku docs like I promised I would. Somehow I missed the notification the first time around. Sorry for the long delay:
https://devcenter.heroku.com/articles/forked-pg-connections#django-rq
— Reply to this email directly or view it on GitHub.
@fdr hi. You seem to have updated the docs with the snippet I provided on https://github.com/ui/django-rq/issues/17#issuecomment-45710490. But this snippet does not solve the "connection reset" like I said... At least in my experience.
Does it solve any other inconvenient symptom?
AFAIK no.
On Fri, Sep 5, 2014 at 7:42 AM, Alexandre Bulté notifications@github.com wrote:
AFAIK no.
Oh. Pity, I guess I'll remove it, then. Thanks for clarifying that for me.
Thanks for trying ;-) Maybe @selwin has a proper way to do this?
I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?
I'll reopen this thread since we don't have a proper solution for this issue.
On Fri, Sep 5, 2014 at 9:31 AM, Selwin Ong notifications@github.com wrote:
I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?
I'll reopen this thread since we don't have a proper solution for this issue.
Heroku is irrelevant in this other than I making the recommendation to open the bug because I've seen people get mystified by file descriptor corruption because of fork() a bunch of times...
So, any pg/fork based worker is affected unless one moves connection creation to post-fork.
Ah ok, got it.
Win~
On Sat, Sep 6, 2014 at 11:25 PM, Daniel Farina notifications@github.com wrote:
On Fri, Sep 5, 2014 at 9:31 AM, Selwin Ong notifications@github.com wrote:
I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?
I'll reopen this thread since we don't have a proper solution for this issue.
Heroku is irrelevant in this other than I making the recommendation to open the bug because I've seen people get mystified by file descriptor corruption because of fork() a bunch of times...
So, any pg/fork based worker is affected unless one moves connection creation to post-fork.
— Reply to this email directly or view it on GitHub https://github.com/ui/django-rq/issues/17#issuecomment-54718577.
I used this with django-rq:
In every function enqueued to run by a worker, I added the connection.close() at the end. Now the postgres logs do not have the "LOG: could not receive data from client: Connection reset by peer".
More importantly, I think those errors were causing the PG archive log to become larger. So fixing this took on importance for the PG backup not running away.
Any idea or update on this problem? I have the exact same issue, tried some of the hacks mentioned above, but nothing works for heroku.. I need to figure out how to post_fork a connection.. Any ideas? Otherwise I would have to completely drop django-rq and go for celery of something similar...
Or you could drop Django :)
FWIW I've been using this for years:
from django.db import connection
from rq.worker import Worker
class ConnectionClosingWorker(Worker):
"""
RQ Worker that closes the database connection before forking.
See also https://github.com/ui/django-rq/issues/17
"""
def execute_job(self, job, queue):
connection.close()
super().execute_job(job, queue)
It's been too long since I wrote it to remember the details, but it solved this issue and hasn't caused problems in a largish rq deploy. By "largish", I mean "we're moving off Heroku because we hit their 300-dyno limit".
I'm using django-rq on Heroku with the Heroku Postgres DB setup. When my RQ worker jobs finish, I always see this in the logs:
Any idea is there's a way to close the database connection when a job ends? For example, I probably need to put this somewhere:
Perhaps it should just be in the worker function itself... but that seems like a hack... thoughts?