Multiple threads querying same table takes a long time

erikseulean commented 1 month ago

Please first make sure you have looked at:

Environment

Python: Python 3.9
pyodbc: 5.1.0
OS: Linux 4.18
DB: Microsoft SQL Server
driver: unixODBC 2.3.12

Issue

Hi folks, I have the following behavior when running a python application that has multiple threads.

I managed to reproduce this with a test script but can't figure out where the issue is coming from, so I hope you might be able to shed some light.

I have a database that has 5 tables. If I create 5 threads and do a SELECT all of the results are returned in a reasonable amount of time.

If however I make each thread run the same SELECT against only one of the tables, the time taken to return hugely increases. I noticed this increasing with the number of threads. The threads all return at about same time but this time increases with the number of threads.

ApplicationIntent is set to ReadOnly, Autocommit=True I also set SQL_ATTR_TXN_ISOLATION to SQL_TXN_READ_UNCOMMITTED but also tried calling "set transaction isolation level read uncommited"

It's worth mentioning that I don't see the same behavior from multiple processes - multiple processes return in a reasonable amount of time and don't grow in time.

I tried looking in the pyodbc code but I don't see anything that would lock and pystack doesn't show any odd behavior. I didn't manage to turn on odbc logs.

Much appreciated for your help.

gordthompson commented 1 month ago

Very old but possibly related:

https://groups.google.com/g/pyodbc/c/2cPVv1hQ7sM

v-chojas commented 1 month ago

Are those 5 threads sharing one connection, or is this with 5 connections to the database?

erikseulean commented 1 month ago

It is 5 different connections - specifically each thread opens a connection and executes the query. I assume there's some internal threadpool but definitely not the same connection object.

erikseulean commented 1 month ago

Very old but possibly related:

https://groups.google.com/g/pyodbc/c/2cPVv1hQ7sM

Yeah I stumbled across this and tried it out but no success. Also 5 processes doing the same query doesn't block.

gordthompson commented 1 month ago

Does this code illustrate what you're asking about?

import logging
import time
from concurrent.futures import ThreadPoolExecutor

import pyodbc

connection_string = "DSN=mssql_199;UID=scott;PWD=tiger^5HHH"
select_statement = "SELECT 1 AS foo"

def thread_function(name):
    logging.info("Thread %s: starting", name)
    t0 = time.perf_counter()
    cnxn = pyodbc.connect(connection_string)
    crsr = cnxn.cursor()
    results = crsr.execute(select_statement).fetchval()
    crsr.close()
    cnxn.close()
    elapsed = time.perf_counter() - t0
    logging.info("Thread %s: finishing (%s seconds)", name, elapsed)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

    t0_main = time.perf_counter()

    num_threads = 3
    logging.info("[num_threads = %s]", num_threads)
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        executor.map(thread_function, range(num_threads))
    logging.info(
        "Threads all done (%s seconds)", time.perf_counter() - t0_main
    )

    t0_main = time.perf_counter()
    for i in range(num_threads):
        cnxn = pyodbc.connect(connection_string)
        crsr = cnxn.cursor()
        results = crsr.execute(select_statement).fetchval()
        crsr.close()
        cnxn.close()
    logging.info("Loop done (%s seconds)", time.perf_counter() - t0_main)

Sample output:

12:57:37: [num_threads = 3]
12:57:37: Thread 0: starting
12:57:37: Thread 1: starting
12:57:37: Thread 2: starting
12:57:38: Thread 1: finishing (0.2834793250003713 seconds)
12:57:38: Thread 0: finishing (0.2989071940028225 seconds)
12:57:38: Thread 2: finishing (0.307108305998554 seconds)
12:57:38: Threads all done (0.31214239499968244 seconds)
12:57:38: Loop done (0.07701467600054457 seconds)

12:56:27: [num_threads = 9]
12:56:27: Thread 0: starting
12:56:27: Thread 1: starting
12:56:27: Thread 2: starting
12:56:27: Thread 3: starting
12:56:27: Thread 4: starting
12:56:27: Thread 5: starting
12:56:27: Thread 6: starting
12:56:27: Thread 7: starting
12:56:27: Thread 8: starting
12:56:28: Thread 0: finishing (0.5598898039970663 seconds)
12:56:28: Thread 1: finishing (0.5588093600017601 seconds)
12:56:28: Thread 2: finishing (0.5580887379983324 seconds)
12:56:28: Thread 8: finishing (0.5656811830012884 seconds)
12:56:28: Thread 7: finishing (0.5696206499997061 seconds)
12:56:28: Thread 6: finishing (0.5895873629997368 seconds)
12:56:28: Thread 5: finishing (0.6092011000000639 seconds)
12:56:28: Thread 4: finishing (0.6120298109999567 seconds)
12:56:28: Thread 3: finishing (0.615750745000696 seconds)
12:56:28: Threads all done (0.6211617680019117 seconds)
12:56:28: Loop done (0.510924154001259 seconds)

erikseulean commented 1 month ago

Yes, just that I'm not using a thread pool executor but spinning up threads in a for loop. I also didn't include the connect in my perf block and I don't close the cursor after fetchall or the connection. Apart from that is identical, but as you can see your time also goes up with the number of threads. And what's more the time is also close to identical. It doesn't seem to make sense, do you agree ?

I also call cursor.execute("SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED") beforehand I execute the select statement.

erikseulean commented 1 month ago

Similarly, if all threads run a sql statement to a different table the time doesn't grow with the number of threads. I tried to also use a with statement to make sure the queries are completely different even if hitting the same table but no luck.

gordthompson commented 1 month ago

Have you tried using SELECT … WITH (NOLOCK) to see if it makes any difference?

erikseulean commented 1 month ago

Ah yes - I forgot to mention. I see some slight better performance but the issue is still there - the performance degrades with the number of threads.

gordthompson commented 1 month ago

For my test, the time does indeed increase with the number of threads, but that is the elapsed "clock on the wall" time from the moment the thread starts until it finishes. Each thread is obviously not executing the whole time because the elapsed time of each thread is approximately the same as the elapsed time for all threads ("Threads all done"). In other words, the elapsed time for a thread is not the amount of time it is actually executing, it also includes the time when that thread is dormant while one of the other threads is active.

As for selecting from one table vs. selecting from different tables, I purposely chose SELECT 1 AS foo because it's not hitting a table at all, so there's no "blocking" due to table/row locking. There will be other contention within SQL Server itself, but there's nothing that pyodbc can do about that. For example, I was hitting a SQL Server instance that's running on an ancient machine with dual-core processor so there's no way that SQL Server can handle 3 simultaneous connections (let alone 9) in a completely parallel way.

TL;DR - Think about what your metrics really mean. Perhaps a more realistic test would be to define a task to be done and then compare the time it takes to complete that task using different numbers of threads. For example, using a single (main) thread to read 1_000_000 rows from a table vs. using 100 threads to read 10_000 rows each.

erikseulean commented 1 month ago

I think it does make sense up to a point. Since execute will release the GIL the threads should execute concurrently (well not practically but theoretically), therefore each thread time should be smaller than the time taken to execute all the commands sequentially.

I understand that if you have an ancient machine you might end up hitting a resource limit, but as I said, running the same 5 threads against 5 different tables shows a time that is substantially lower than running it against the same table and there are no resource bottlenecks on the db server. Additionally, running this from multiple processes (5 processes with 1 process each instead of 5 threads) has results significantly better (10 times or more). If the first table takes x ms to execute and I hit 5 different tables I will see the total time taken as something x + t where t is the sum of the other 4 tables. If I hit the first table 5 times, the time is not 5x but something along 15-20x ms which grows with the number of threads.

Row level locking should not incur on a select query if I set transaction isolation level uncommitted I think.

I think not hitting a real database might also be misleading.

I did write a cpp piece of code and I don't see the same issue, the time taken to run the query is not increasing with the number of threads.

erikseulean commented 1 month ago

Also this:

12:57:38: Threads all done (0.31214239499968244 seconds)
12:57:38: Loop done (0.07701467600054457 seconds)

makes little sense. Looping sequentially executing the query finishes much faster than running those 3 threads. (4x faster)

erikseulean commented 1 month ago

TL;DR - Think about what your metrics really mean. Perhaps a more realistic test would be to define a task to be done and then compare the time it takes to complete that task using different numbers of threads. For example, using a single (main) thread to read 1_000_000 rows from a table vs. using 100 threads to read 10_000 rows each.

It's not really the usecase here. Imagine a client-server application where a server has a grpc pool of workers handling requests. If all my requests are queries to different tables they respond fast, if they hit the same table they all slow down massively.

gordthompson commented 1 month ago

running the same 5 threads against 5 different tables shows a time that is substantially lower than running it against the same table

Sorry, but I'm not seeing that either. I created 5 tables (tbl_0 .. tbl_4) with 5 columns x 1000 rows. I tweaked my test so that each thread either hit the same table (tbl0) or hit a different one (e.g., thread#0 hit tbl0, thread#1 hit tbl_1, etc.). The SELECT statement used WHERE nvarchar_col = '…' and the tables had no index on that column, so each SELECT did a table scan. I ran each test 12 times, discarding the highest and lowest values in each batch. The remaining 10 observations for each batch were

[same]

0.050622984 0.053793429 0.055004407 0.059796986 0.091637345 0.093007212 0.096713479 0.098369835 0.127283216 0.152296241

mean: 0.087852513 variance: 0.001149997

[different]

0.057655652 0.065047164 0.065562941 0.06752632 0.068541742 0.07300041 0.09892619 0.099178946 0.109344414 0.131078461

mean: 0.083586224 variance: 0.000592743

A t-Test in Excel gave a t statistic of 0.323172196 and a two-tail critical value of 2.10092204, implying that the two means are not significantly different. Certainly there was nothing like 2- or 3-fold difference in the time taken to complete the tests for the two scenarios.

Can you reproduce your results if you use pymssql instead of pyodbc? It appears that pymysql has the same threading limitations as pyodbc, namely threadsafety=1: "threads may share the module but not connections".

erikseulean commented 1 month ago

I will try - is your example using the same driver as above ? unixODBC 2.3.12

I did try this with two different databases and I was seeing the same result but will look into more fine details tomorrow. I also tried sqlalchemy which showed increased performance but similar pattern although I think under the hood is still using pyodbc.

gordthompson commented 1 month ago

is your example using the same driver as above ? unixODBC 2.3.12

Gentle terminology correction: The ODBC Driver is something like "ODBC Driver 18 for SQL Server", or "msodbcsql18" for short (from the name of the .so or .dll file). unixODBC is an ODBC Driver Manager (DM). I was running my tests on Windows, so I was using Windows' built-in DM (odbc32.dll) — not unixODBC — but I wasn't too concerned because we are looking at (what we think might be) a server-side blocking issue and the DM is unlikely to be relevant.

I think under the hood [SQLAlchemy] is still using pyodbc.

If you use mssql+pyodbc://, yes. SQLAlchemy can also use pymssql with mssql+pymssql://

gordthompson commented 1 month ago

Just repeated my test on Ubuntu 22.04 with unixODBC version 2.3.9 and msodbcsql18 version 18.3.3

[same]

mean: 0.460556142 variance: 0.005884503

[different]

mean: 0.47910388 variance: 0.007232559

t Stat: 0.512121114 t Critical two-tail: 2.10092204

=> means are not significantly different

v-chojas commented 1 month ago

Your claim that querying different tables vs. the same table results in different times already eliminates pyODBC as the cause. pyODBC doesn't attempt to interpret the query so it can't behave differently depending on that. I agree with gordthompson that this is almost certainly server-side.

gordthompson commented 1 month ago

Also this:
12:57:38: Threads all done (0.31214239499968244 seconds)
12:57:38: Loop done (0.07701467600054457 seconds)
makes little sense. Looping sequentially executing the query finishes much faster than running those 3 threads. (4x faster)

It's because I ran that test on Windows, and ODBC connection pooling is automatically enabled for "ODBC Driver 17/18 for SQL Server" on Windows. The three threads each had to open their own connection, and when they were finished the connections were returned to the pool, so the pyodbc.connect() in the loop was given one of those connections instead of having to open another one. The SELECT was so simple that its execution time was far less than the time to establish a new connection.

I actually added the loop to try and provide insight into the overhead required to manage the threads, but I didn't consider the connection pool. My bad.

mkleehammer / pyodbc

Multiple threads querying same table takes a long time #1355

Environment

Issue