sqlalchemy / sqlalchemy

The Database Toolkit for Python
https://www.sqlalchemy.org
MIT License
9.44k stars 1.41k forks source link

StatementError Exceptions un-pickable. #2371

Closed sqlalchemy-bot closed 12 years ago

sqlalchemy-bot commented 12 years ago

Migrated issue, originally created by Lorenzo Bolla (@lbolla)

All StatementError exceptions (and exceptions derived) are not pickable. This means that they cannot be used, for example, in the callback of multiprocessing.map_async function (which uses cPickle to communicate between master and workers).

See discussion here: http://stackoverflow.com/a/8786557/1063605

This is probably a Python bug, and it's been already issued: http://bugs.python.org/issue13751 but it would be nice to workaround it in SQLAlchemy, too.


Attachments: unpickable.py

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

this is tested/fixed on the SQLAlchemy side in d78d2d60aa30b0b6c3c230ddf3cafda2529e6409. However, at least psycopg2 doesn't support pickling of their exceptions, since they package the cursor within it, and probably other DBAPIs - so the usage of multiprocessing still needs you to re-raise exceptions as something harmless. You'll need to post on their tracker as well http://initd.org/psycopg/tracker/ to get it fixed on their end.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

In 2: from cPickle import loads, dumps

In 3: loads(dumps(n))

TypeError Traceback (most recent call last)

/usr/local/src/sqlalchemy/sqlalchemy/lib/sqlalchemy/ in ()

TypeError: ('init() takes exactly 4 arguments (2 given)', <class 'exc.NoReferencedColumnError'>, ('foo',))

This corresponds to

class NoReferencedColumnError(NoReferenceError): """Raised by ForeignKey when the referred Column cannot be located."""

def __init__(self, message, tname, cname):
    NoReferenceError.__init__(self, message)
    self.table_name = tname
    self.column_name = cname

but you have not added a __reduce__ function for that class. There are a couple of other instances, like NoReferencedTableError and CircularDependencyError.

Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

I came up with the following simple example

import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

Which gives:

relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "", line 11, in File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

Is this convincing, or should I give an example using multiprocessing?

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

Replying to faheem:

Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

In 2: from cPickle import loads, dumps

In 3: loads(dumps(n))

TypeError Traceback (most recent call last)

oh yes, missed those. reopening. though those are typically raised at configuration time.

Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.

as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

Replying to faheem:

I came up with the following simple example

import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

Which gives:

relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "", line 11, in File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

Is this convincing, or should I give an example using multiprocessing?

It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

Replying to zzzeek:

Replying to faheem:

Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

I didn't mean to cause confusion. I just meant there are other ways one can do this. Per sbt in

http://stackoverflow.com/questions/8785899/hang-in-python-script-using-sqlalchemy-and-multiprocessing e.g.

class GoodExc1(Exception):
    def __init__(self, a):
        **Non-optional param in the constructor.**
        Exception.__init__(self, a)
        self.a = a

or

class GoodExc2(Exception):
    def __init__(self, a):
        **Non-optional param in the constructor.**
        self.args = (a,)
        self.a = a

But maybe these are no better than the way you are doing it.

as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.

So something like what lbolla is doing in that question looks reasonable to you? I.e.

def do(kwargs):
    try:
        i = kwargs['i']('i')
        print i
        raise BadExc('a')
        return i
    except Exception as e:
        raise Exception(repr(e))

I don't understand how this avoids the problem, but it seems to work.

Thanks, Faheem

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

Replying to zzzeek:

It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.

I've reported it here. The developers seem to be considering it, but I'm not really following the discussion. I did try to produce an example that would hang like the SQLA one, but was not able to. I did the following, but it doesn't hang as I would have expected. Any idea why?

import multiprocessing, psycopg2

def do(kwargs):
    i = kwargs['i']('i')
    conn = psycopg2.connect("dbname='illumina_faheem'")
    cur = conn.cursor()
    cur.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")

pool = multiprocessing.Pool(processes=5)               # start 5 worker processes
results = [= [](]
arglist)
for i in range(10):
    arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()
sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

Seems like they've accepted it. See how quick they were to close it at first ? Yeah, we all have itchy trigger fingers in this business :).

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

BTW, I didn't receive any of the updates to this ticket by email, as I would have expected.

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

I've updated the trac configuration let's see what happens.

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

OK some more exceptions fixed in 26625d897f9becd7bccc6ece037e9ad39a6a77e9 and ORM + more tests in rc8b4b7c8cb7b9. If jenkins makes it through we should be good.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

I just checked your changes. How about CircularDependencyError in lib/sqlalchemy/exc.py? This doesn't have a __reduce__ defined. Otherwise looks good.

I got an email notification of the last comment.

What is jenkins?

Faheem.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

Thought about enabling openid support? See https://bitbucket.org/Dalius/authopenid-plugin/wiki/Home. This is in Debian, so is probably reasonably stable.

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

7026ffe57c76821af3f1d5d01721e4035dc82de9

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

Replying to faheem:

Thought about enabling openid support? See https://bitbucket.org/Dalius/authopenid-plugin/wiki/Home. This is in Debian, so is probably reasonably stable.

I'd prefer an OAuth plugin that I can hook into github, couldn't find one. openID seems kind of hopeless to me.

sqlalchemy-bot commented 12 years ago

Michael Bayer (@zzzeek) wrote:

also we're on 0.12 so this wouldn't work anyway.

sqlalchemy-bot commented 12 years ago

Faheem Mitha (@faheem) wrote:

I've never heard of OAuth before. One learns something new every day. By hopeless do you mean insecure, or do you have other objections? I've used openID for awhile. I don't know about security, but I like not having to keep track of 10 million passwords.

sqlalchemy-bot commented 12 years ago

Changes by Michael Bayer (@zzzeek):

sqlalchemy-bot commented 12 years ago

Changes by Michael Bayer (@zzzeek):

sqlalchemy-bot commented 12 years ago

Changes by Michael Bayer (@zzzeek):

sqlalchemy-bot commented 12 years ago

Changes by Michael Bayer (@zzzeek):

sqlalchemy-bot commented 12 years ago

Changes by Michael Bayer (@zzzeek):