rq / django-rq

A simple app that provides django integration for RQ (Redis Queue)
MIT License
1.83k stars 286 forks source link

Closing the database connection #17

Open davecap opened 11 years ago

davecap commented 11 years ago

I'm using django-rq on Heroku with the Heroku Postgres DB setup. When my RQ worker jobs finish, I always see this in the logs:

LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection

Any idea is there's a way to close the database connection when a job ends? For example, I probably need to put this somewhere:

from django.db import connection
connection.close()

Perhaps it should just be in the worker function itself... but that seems like a hack... thoughts?

selwin commented 11 years ago

I've never written a Heroku application before so this will be hard for me to debug. Does explicitly closing the connection on your queued function fix it? I did a little bit of googling and it seems that this problem is not specific to Django/RQ. There's a resque thread discussing the same problem https://github.com/defunkt/resque/issues/367

davecap commented 11 years ago

Yea I saw that thread too... I'm surprised I haven't seen any traces of this problem with Django though. I was kind-of wondering if I could implement a similar fix to the gem mentioned in that thread. What they're doing is adding the disconnect on some sort of "after_job" callback. Does such a thing exist with RQ?

selwin commented 11 years ago

No, there's no such thing as "after_job" in RQ. We can propose adding one, as well as the ability to use a different worker class since this is already in RQ's roadmap (implementing concurrent workers).

However, we need to first ensure that the proposed solution does indeed solve the problem. Can you try if running "close_connection" in the function being queued fix this?

davecap commented 11 years ago

Yep. I just tested it one of my async functions and I don't see the EOF errors.

selwin commented 11 years ago

I've been some more thinking about this, here's a few ideas:

Alternative 1

Writing a wrapper function for async functions - this doesn't require any changes to RQ but could be tedious if every single enqueued django function on Heroku needs to explicitly close db connection.


from django.db import connection

def func():
    #pass

def rq_func():
    func()
    connection.close()

queue.enqueue(rq_func)

Alternative 2

Implementing a job callback pattern, however, this idea was rejected in https://github.com/nvie/rq/issues/155, I'm also not a big fan of the notion of having callbacks in RQ.

queue.enqueue(f, callback=close_connection)

Alternative 3

Add a job_class argument to Worker to allow the use of custom job classes. In this case, Django RQ can define its own Job class to deal with this issue. It would look something like:


from rq.job import Job
from django.db import connection

ConnectionClosingJob(Job):

    def perform(self):
        result = super(DatabaseClosingJob, self).perform()
        connection.close()
        return result

# Somewhere in Django-RQ

worker = Worker(connection=connection, job_class=ConnectionClosingJob)

@nvie, what do you think?

acjay commented 11 years ago

I'm experiencing the same issue. I'm glad I checked, because I had no idea what the cause issue was. I think alternatives 2 or 3 would be good. I would think of it less of a callback than as a hook. I would ideally like to do other post-job housekeeping.

acjay commented 11 years ago

Just to add, I'm not sure Alternative 1 really works when func() is a bound method. In my use case, I'm binding a method at run time and queuing that method directly. I'm refactoring to try to pass a wrapper function instead that will recover the method reference job-side and do the necessary cleanup, but it's really gumming up the code.

nvie commented 11 years ago

I don't understand the problem in the first place. Stripping away django-rq and RQ from the equation, what's the core of the problem we're looking at here? Isn't it that Django implicitly opens a connection, but does never explicitly close it until its Python process is terminated (which never happens in an RQ worker situation)?

davecap commented 11 years ago

From what I understand it's a harmless postgres error message that occurs when no Terminate message is received. I wonder if enabling "autocommit" mode for the psycopg2 DB backend would fix it.

acjay commented 11 years ago

@nvie That's one manifestation of the problem, but to me, the overall problem is that I want to be able to do other post-task work. In my case, I'm manually tracking tasks in flight so that I can prevent duplicate tasks with the same arguments from running. In the current paradigm, I have to schedule a wrapper function that calls that actual task function performs this clean up. But since there are multiple actual task functions that might be called, and since I can't pass function references as arguments, my wrapper function has to do some serious gymnastics to recreate the task function reference worker-side. The whole thing would be much easier if there were hooks for post-task work. I really like Alternative 3, personally.

fangsterr commented 11 years ago

The problem I believe is that RQ forks its worker processes from the main process, and django shares its database connection from the parent process to the child process. http://stackoverflow.com/questions/8242837/django-multiprocessing-and-database-connections sums up the problem pretty well.

selwin commented 11 years ago

We have decided to implement the fix on RQ itself instead of django-rq. RQ will allow the user to use a custom Worker class where we can then tell the worker to do close db connection before forking.

There's an open issue tracking this feature on https://github.com/nvie/rq/issues/233 . Feel free to contribute/chime in on the discussion on the thread.

abulte commented 10 years ago

I came here because I was worried with the could not receive data from client: Connection reset by peer on my django/django-rq app running on Heroku.

I applied the tip described here to create "clean" connections when the worker is launched and avoir the "forked connections" syndrome that seems to be the root cause: https://github.com/nvie/rq/issues/233#issuecomment-25010141

But I stil get reset connection message on Heroku. Did I miss something?

selwin commented 10 years ago

@abulte mind showing me a snippet of your code that manages postgres connection? If I remember correctly @fdr also wanted to put this up on Heroku's docs so we should definitely work out a reliable way of doing this and properly document it.

abulte commented 10 years ago

Here's my snippet, directly inspired from the mentioned link:

from rq import Worker
from django.db import close_connection

class AAWorker(Worker):

    def fork_and_perform_job(self, *args, **kwargs):
        close_connection()
        super(AAWorker, self).fork_and_perform_job(*args, **kwargs)

I should add that it did not solve the "connection reset" problem.

I would be very glad to have the heroku docs for doing that correctly ;-)

Thanks for your help.

foxx commented 10 years ago

@abulte THANK YOU dude, this just saved me so much time and effort.

EDIT: Sorry, scratch that response, this didn't work for me, I think I have a different bug :/ Still thank you though lol.

fdr commented 10 years ago

I went ahead and updated the Heroku docs like I promised I would. Somehow I missed the notification the first time around. Sorry for the long delay:

https://devcenter.heroku.com/articles/forked-pg-connections#django-rq

selwin commented 10 years ago

@fdr thanks!

Sent from my phone

On Sep 5, 2014, at 6:42 AM, Daniel Farina notifications@github.com wrote:

I went ahead and updated the Heroku docs like I promised I would. Somehow I missed the notification the first time around. Sorry for the long delay:

https://devcenter.heroku.com/articles/forked-pg-connections#django-rq

— Reply to this email directly or view it on GitHub.

abulte commented 10 years ago

@fdr hi. You seem to have updated the docs with the snippet I provided on https://github.com/ui/django-rq/issues/17#issuecomment-45710490. But this snippet does not solve the "connection reset" like I said... At least in my experience.

fdr commented 10 years ago

Does it solve any other inconvenient symptom?

abulte commented 10 years ago

AFAIK no.

fdr commented 10 years ago

On Fri, Sep 5, 2014 at 7:42 AM, Alexandre Bulté notifications@github.com wrote:

AFAIK no.

Oh. Pity, I guess I'll remove it, then. Thanks for clarifying that for me.

abulte commented 10 years ago

Thanks for trying ;-) Maybe @selwin has a proper way to do this?

selwin commented 10 years ago

I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?

I'll reopen this thread since we don't have a proper solution for this issue.

fdr commented 10 years ago

On Fri, Sep 5, 2014 at 9:31 AM, Selwin Ong notifications@github.com wrote:

I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?

I'll reopen this thread since we don't have a proper solution for this issue.

Heroku is irrelevant in this other than I making the recommendation to open the bug because I've seen people get mystified by file descriptor corruption because of fork() a bunch of times...

So, any pg/fork based worker is affected unless one moves connection creation to post-fork.

selwin commented 10 years ago

Ah ok, got it.

Win~

On Sat, Sep 6, 2014 at 11:25 PM, Daniel Farina notifications@github.com wrote:

On Fri, Sep 5, 2014 at 9:31 AM, Selwin Ong notifications@github.com wrote:

I've never used Heroku before so no :(. Does closing db connection right after worker is started and after job completion help?

I'll reopen this thread since we don't have a proper solution for this issue.

Heroku is irrelevant in this other than I making the recommendation to open the bug because I've seen people get mystified by file descriptor corruption because of fork() a bunch of times...

So, any pg/fork based worker is affected unless one moves connection creation to post-fork.

— Reply to this email directly or view it on GitHub https://github.com/ui/django-rq/issues/17#issuecomment-54718577.

sachinkagarwal commented 9 years ago

I used this with django-rq:

In every function enqueued to run by a worker, I added the connection.close() at the end. Now the postgres logs do not have the "LOG: could not receive data from client: Connection reset by peer".

More importantly, I think those errors were causing the PG archive log to become larger. So fixing this took on importance for the PG backup not running away.

mpetyx commented 8 years ago

Any idea or update on this problem? I have the exact same issue, tried some of the hacks mentioned above, but nothing works for heroku.. I need to figure out how to post_fork a connection.. Any ideas? Otherwise I would have to completely drop django-rq and go for celery of something similar...

foxx commented 8 years ago

Or you could drop Django :)

aaugustin commented 5 years ago

FWIW I've been using this for years:

from django.db import connection
from rq.worker import Worker

class ConnectionClosingWorker(Worker):
    """
    RQ Worker that closes the database connection before forking.

    See also https://github.com/ui/django-rq/issues/17

    """

    def execute_job(self, job, queue):
        connection.close()
        super().execute_job(job, queue)

It's been too long since I wrote it to remember the details, but it solved this issue and hasn't caused problems in a largish rq deploy. By "largish", I mean "we're moving off Heroku because we hit their 300-dyno limit".