Open leezu opened 6 years ago
Could you create a small reproducible example? And can you paste the current traceback?
but the multiprocessing module requests the chunks in a different Thread
You mean different process, right?
Do you have any interest in working on a fix?
Reproducible example:
import tempfile
import numpy as np
import multiprocessing as mp
import pandas as pd
# Find a temporary file to store the database
_, f = tempfile.mkstemp()
# Put some data in the database
url = f"sqlite://{f}"
df = pd.DataFrame.from_records(np.random.normal(size=(100, 100)))
df.to_sql(name="test", con=url)
# Retrieve via python mutliprocessing module
def process_chunk(chunk):
pass
pool = mp.Pool()
query = "SELECT * FROM test"
for vals in pool.imap_unordered(process_chunk,
pd.read_sql_query(query, url, chunksize=2)):
pass
You mean different process, right?
No, I mean that the multiprocessing module retrieves data from the iterator returned by pd.read_sql_query in a different Thread. Subsequently that data is of course passed to a different process, but this bug is concerned about not even getting so far ;)
Do you have any interest in working on a fix?
I'm interested but won't have time before end of February. So if someone else finds time to work on this I wouldn't mind. In the meantime the workaround detailed in the first post works.
And this is the traceback
/home/leonard/.local/lib64/python3.6/site-packages/matplotlib/colors.py:298: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
if cbook.is_string_like(arg):
Error closing cursor
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1076, in _fetchmany_impl
return self.cursor.fetchmany(size)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1288, in _safe_close_cursor
cursor.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
Exception during reset or similar
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1076, in _fetchmany_impl
return self.cursor.fetchmany(size)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
self._autorollback()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 76, in __exit__
compat.reraise(type_, value, traceback)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
self._autorollback()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 827, in _autorollback
self._root._rollback_impl()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1334, in _handle_dbapi_exception
exc_info
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584 (Background on this error at: http://sqlalche.me/e/f405)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/pool.py", line 703, in _finalize_fairy
fairy._reset(pool)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/pool.py", line 873, in _reset
pool._dialect.do_rollback(self)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
Exception closing connection <sqlite3.Connection object at 0x7f6ed4b368f0>
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1076, in _fetchmany_impl
return self.cursor.fetchmany(size)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
self._autorollback()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 76, in __exit__
compat.reraise(type_, value, traceback)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1406, in _handle_dbapi_exception
self._autorollback()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 827, in _autorollback
self._root._rollback_impl()
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 706, in _rollback_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1334, in _handle_dbapi_exception
exc_info
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 704, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584 (Background on this error at: http://sqlalche.me/e/f405)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/pool.py", line 703, in _finalize_fairy
fairy._reset(pool)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/pool.py", line 873, in _reset
pool._dialect.do_rollback(self)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 457, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/pool.py", line 317, in _close_connection
self._dialect.do_close(connection)
File "/home/leonard/.local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 463, in do_close
dbapi_connection.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584
Traceback (most recent call last):
File "pandas-mp-sql.py", line 23, in <module>
pd.read_sql_query(query, url, chunksize=2)):
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 735, in next
raise value
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140114398058240 and this is thread id 140114281891584 (Background on this error at: http://sqlalche.me/e/f405)
@leezu I get a different error with your example:
@TomAugspurger I just reran the example with Python 3.6.5 on Linux and could still reproduce the SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140612946281792 and this is thread id 140612829673216
error. Are you running on a different system?
Yep, MacOS. Must be platform dependent.
On Wed, Jun 27, 2018 at 11:08 AM, Leonard Lausen notifications@github.com wrote:
@TomAugspurger https://github.com/TomAugspurger I just reran the example with Python 3.6.5 on Linux and could still reproduce the SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140612946281792 and this is thread id 140612829673216 error. Are you running on a different system?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19457#issuecomment-400733434, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHImDSdYSU0HQ9lFHDh9f7MqYgpzdmks5uA63ygaJpZM4Rx9bQ .
Adding an extra /
to the url
in line 11 allows me to reproduce the issue on OS X. Ie. url = f"sqlite:///{f}"
instead of url = f"sqlite://{f}"
Nice catch, confirmed that it raises for me with the same error you get.
On Wed, Jun 27, 2018 at 2:13 PM, Leonard Lausen notifications@github.com wrote:
Adding an extra / to the url in line 11 should allow to reproduce the issue on OS X. Ie. url = f"sqlite:///{f}" instead of url = f"sqlite://{f}"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19457#issuecomment-400797405, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIrbMy9JDIkByUwPdDIT_BCdXd7UBks5uA9legaJpZM4Rx9bQ .
I am trying to pass the connection and use the wrapper, but with imap_unordered my kernel dies immediately. With pool.map it does not read in parallel. Here is how i am doing it:
import multiprocessing as mp import attr @attr.s(auto_attribs=True) class PDSQLQueryWrapper(object): """Wrap the iterator.
To create the db engine in the thread that calls the iterator first.
"""
_read_sql_query_iterator = None
query = $my_query$
connection = $my_conn$
chunksize= CHUNKSIZE
def next(self):
if self._read_sql_query_iterator is None:
self._read_sql_query_iterator = pd.read_sql_query(
self.query, self.connection, chunksize=self.chunksize)
return next(self._read_sql_query_iterator)
########################## read_sql = PDSQLQueryWrapper() dfs=[] def process_chunk(chunk): return chunk
pool = mp.Pool() for vals in pool.map(process_chunk,read_sql): dfs.append(vals)
Is there anyway I can fix this?
pd.read_sql(query, "sqlite:///sqlite.db?check_same_thread=False",chunksize=10000)
Code Sample
Problem description
Using
pd.read_sql_query
with chunksize, sqlite and with the multiprocessing module currently fails, aspandasSQL_builder
is called on execution ofpd.read_sql_query
, but the multiprocessing module requests the chunks in a different Thread (and the generated sqlite connection only wants to be used in the thread where it was created so it throws an Exception.).Workaround
Create a wrapper around
pd.read_sql_query
that only callspd.read_sql_query
once the first chunk is requested. ThereforepandasSQL_builder
will be called within the Thread requesting the chunks.I believe pandas should play nicely with the python multiprocessing module, delaying the creation of the sqlite connection and thereby fixing this issue.