pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

"Not unique table/alias" raised in concurrent queries #320

Open elvisgs opened 4 years ago

elvisgs commented 4 years ago

Since version 1.2.3, the lib is failing when executing concurrent queries (ThreadPoolExecutor + db_pool_size > 1). Exception raised:


[SQL: SELECT count(*) AS count_1 
FROM cadernos_pesquisados, cadernos_pesquisados 
WHERE cadernos_pesquisados.sinc_id = %(sinc_id_1)s AND cadernos_pesquisados.origem = %(origem_1)s AND cadernos_pesquisados.termino IS NOT NULL]
[parameters: {'sinc_id_1': 585130, 'origem_1': 'revisor'}]
(Background on this error at: http://sqlalche.me/e/2j85)```
pudo commented 4 years ago

Can you post a larger part of the error message? It's not immediately clear what the error is from this excerpt.

elvisgs commented 4 years ago

Thanks for quick reply. Full traceback:

Traceback (most recent call last):
  File "/storage/projetos/my_project/src/pesquisador/base.py", line 25, in executa
    asyncio.run(self._executa_pesquisa())
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/storage/projetos/my_project/src/pesquisador/base.py", line 37, in _executa_pesquisa
    if self.db.caderno_foi_pesquisado(caderno, self.origem):
  File "/storage/projetos/my_project/src/pesquisador/servicos/acesso_dados.py", line 35, in caderno_foi_pesquisado
    sinc_id=sinc_id, origem=origem, termino={'not': None},
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/dataset/table.py", line 657, in count
    rp = self.db.executable.execute(query)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/home/elvis/.virtualenvs/my_project/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1066, "Not unique table/alias: 'cadernos_pesquisados'")
[SQL: SELECT count(*) AS count_1 
FROM cadernos_pesquisados, cadernos_pesquisados, cadernos_pesquisados 
WHERE cadernos_pesquisados.sinc_id = %(sinc_id_1)s AND cadernos_pesquisados.origem = %(origem_1)s AND cadernos_pesquisados.termino IS NOT NULL]
[parameters: {'sinc_id_1': 585129, 'origem_1': 'revisor'}]
(Background on this error at: http://sqlalche.me/e/2j85)
elvisgs commented 4 years ago

I've been investigating this bug and figured out some things:

I've managed to reproduce it in test suite like so:

import asyncio

def asynctest(test_coro):
    @wraps(test_coro)
    def wrapper(*args, **kwargs):
        asyncio.run(test_coro(*args, **kwargs))
    return wrapper

class DatabaseTestCase(unittest.TestCase):
    # ... other tests ...

    @asynctest
    async def test_concurrent_queries(self):
        if 'sqlite' in self.db.engine.dialect.dbapi.__name__:
            return

        # new connection to an existing and populated database
        db = connect(url=os.getenv('DATABASE_URL'))
        tbl = db['weather']

        executor = ThreadPoolExecutor(max_workers=5)

        def execute_sync_query():
            with db:
                return tbl.count(place=TEST_CITY_1, temperature={'>=': 6})

        async def execute_async_query():
            loop = asyncio.get_running_loop()
            return await loop.run_in_executor(executor, execute_sync_query)

        tasks = [
            asyncio.create_task(execute_async_query())
            for _ in range(5)
        ]
        counts = await asyncio.gather(*tasks)
        assert counts == [2, 2, 2, 2, 2]

And than run the test suite: DATABASE_URL='postgresql or mysql URL' make test. May be necessary run more than once to see it failing.

Remalloc commented 4 years ago

I'm trying to reproduce it use your TestCase, but it always success. Could you provide more details about your environment, such as:

  1. OS platform
  2. pymysql version
  3. Python version
  4. accurate dataset version @elvisgs