sqlalchemy / sqlalchemy

The Database Toolkit for Python
https://www.sqlalchemy.org
MIT License
9.38k stars 1.41k forks source link

A possible race condition between dialect initialization and query execution #6337

Closed yuvalmarciano closed 3 years ago

yuvalmarciano commented 3 years ago

Describe the bug Hi, As the title says, we face a possible race condition between dialect initialization and query execution. We're using SQLAlchemy of version 1.4.8 with psycopg2 engine of version 2.8.6, and PostgreSQL database of version 12.4. We have a web server responsible of querying this database, and to do so it's using a session factory (sessionmaker) object attached to an engine we create with a pool_size=20 limitation.

On one of the queries the server performs we use the ILIKE operator on a textual column, and set it to escape \ characters:

column.ilike(f"%{value}%", escape="\\")

and when we execute this query it raises the following exception:

DataError: (psycopg2.errors.InvalidEscapeSequence) invalid escape string
HINT:  Escape string must be empty or one character.

The compiled query that's attached to the exception contains this condition:

WHERE my_column ILIKE %(value1_1)s ESCAPE '\\' 

(Obviously the escaped string is supposed to contain only one slash character).

While trying to debug this issue using breakpoints and debug prints - we noticed the render_literal_value method of the PGCompiler class, and it looks related to our issue: image

A direct suspect from this method is the dialect object that's attached to the compiler and its _backslash_escapes attribute specifically.

After digging into it - we noticed that this attribute is set to True by default on the PGDialect class, and its value is determined again upon the PGDialect instance initialization (on initialize method): image image

By planting debug prints on the initialize method of PGDialect and the render_literal_value method of the PGCompiler, we observed the fact that sometimes the first rendering is called before the initialization takes place. At this state, the SQL Compilation Caching (https://docs.sqlalchemy.org/en/14/core/connections.html#sql-compilation-caching) introduced in SQLAlchemy 1.4 prevents the compiler instance from rendering ILIKE operators differently, and all queries are failing.

To support our suspicion, we tried to disable the compilation caching, and from this point the behavior was as "expected":

Expected behavior The query is expected to succeed from the first time it's executed.

To Reproduce We've done everything we can to reproduce this error on ipython, but couldn't (the debug prints say the initialization happens before the first rendering). Here is the session creation mechanism we use:

engine = create_engine(
        <connection_string>,
        executemany_mode="values",
        executemany_values_page_size=10000,
        executemany_batch_page_size=500,
        pool_size=20,
        pool_timeout=30,
        max_overflow=0,
        connect_args={"application_name": <some application name>},
    )
session_factory = sessionmaker(bind=engine, autoflush=False)
session = session_factory()

Versions.

We'd really appreciate your attention on this issue. Thanks!

Have a nice day!

CaselIT commented 3 years ago

Hi,

You mentioned a webserver, does it use multiple threads? Could it be that while one thread is initializing the dialect another one is executing the ilike query?

I think we have a lock for a similar case, but I'm not sure off the top of my head if it's used also for the dialect initialization.

zzzeek commented 3 years ago

the connection pool should not grant a connection until the first connect has succeeded, within which the initialization takes place.

I would ask if this occurs on 1.3 but in 1.3 we dont have the caching, so it would only occur on one or two initial statements. but as it appears this can be reproduced are you able to reproduce on 1.3 ?

zzzeek commented 3 years ago

wowee. nevermind, this reproduces immediately on 1.4 and not on 1.3. huh.

import threading

from sqlalchemy import create_engine

def run(engine):
    with engine.connect() as conn:
        assert conn.dialect.server_version_info > (12,)

engine = create_engine("postgresql://scott:tiger@localhost/test", pool_size=20)
# this fixes
# engine.connect().close()

threads = [threading.Thread(target=run, args=(engine,)) for i in range(20)]
for t in threads:
    t.start()
zzzeek commented 3 years ago

yeah i knew i had made a change here this is #5497 .

CaselIT commented 3 years ago

I guess we should have a test that does that check using threads

CaselIT commented 3 years ago

is this only on pg?

zzzeek commented 3 years ago

well pg has a bad symptom here but all of the dialects should not be used to compile until initialize.

sqla-tester commented 3 years ago

Mike Bayer has proposed a fix for this issue in the master branch:

Add new "sync once" mode for pool.connect https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2766

zzzeek commented 3 years ago

thanks for reporting!

we had this exact same issue about 8 years ago in #2964 however unfortunately the tests added for that only tested the connection pool itself when the "first_connect" event is used; it didn't test that the Engine is actually using the first_connect event and that's what changed in #5497, and the tests did not catch it. obviously that's fixed now.

yuvalmarciano commented 3 years ago

@zzzeek @CaselIT Thank you so much for the quick response and fix!

scott-c-h commented 12 months ago

Sorry to post into a closed bug - I have a stack trace for a similar case (except single threaded) that seems to be resolved by the same changes. I thought it would be worth posting this here for anyone trying to track down whether this issue is resolved in SA1.4 / SA2.0 (which it seems to be).

In SA1.3 a deadlock could be triggered in the MSSQL if the database restarted during initialization. This can be replicated by putting a breakpoint in e.g., _setup_supports_nvarchar_max(), then restarting the database when the breakpoint is hit, then continuing the program.

The change in the associated PR to set _allow_revalidate = False seems to be the piece that fixes this issue. image

HTH anyone who has the same issue - a sample stack-trace is below.

Current thread 0x00007f75ab25f740 (most recent call first):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 282 in _exec_once_impl
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 314 in exec_once_unless_exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 664 in __connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 440 in __init__
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 309 in _create_connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 137 in _do_get
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 495 in checkout
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 778 in _checkout
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 304 in unique_connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2336 in _wrap_pool_connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2369 in raw_connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 473 in _revalidate_connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 390 in connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 401 in check_unicode
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 433 in <setcomp>
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 433 in _check_unicode_returns
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 330 in initialize
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/mssql/base.py", line 2451 in initialize
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/strategies.py", line 199 in first_connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 1406 in go
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 322 in __call__
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 285 in _exec_once_impl
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 314 in exec_once_unless_exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 664 in __connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 440 in __init__
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 309 in _create_connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 137 in _do_get
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 495 in checkout
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 778 in _checkout
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 364 in connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2336 in _wrap_pool_connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2302 in _contextual_connect
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 433 in _connection_for_bind
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1150 in _connection_for_bind
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1142 in connection
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3550 in _connection_from_session
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3571 in _get_bind_args
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3556 in _execute_and_instances
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3535 in __iter__
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3373 in all