mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.95k stars 563 forks source link

Table index is not used when using parameterized queries on SQLServer #1151

Closed Pivert closed 1 year ago

Pivert commented 1 year ago

Environment

Issue

observed behavior : Parameterized queries do not use the index and are dead slow expected behavior : Parameterized queries should be a bit faster than raw ones

The same RAW query properly uses the index and is 1000x faster.

Example code

import pyodbc
import os
import time

SERVER = ""
DATABASE = ""
USER = ""
dbpassword = ""

ITERATIONS = 100
cnxstr = (
        'DRIVER={ODBC Driver 18 for SQL Server};'
        f'SERVER={SERVER};'
        f'DATABASE={DATABASE};'
        f'UID={USER};'
        f'PWD={ dbpassword };ENCRYPT=no'
)

cnxn = pyodbc.connect(cnxstr)
cursor = cnxn.cursor()

start_time = time.time()
for i in range(ITERATIONS):
    select_stmt = """
        SELECT ID FROM bulk
        WHERE FILENAME = ? and COMPANY = ?
        """
    cursor.execute(select_stmt, 'file.out', "12345678")

end_time = time.time()
print(f"Parameterized query execution time {end_time - start_time}")

start_time = time.time()
for i in range(ITERATIONS):
    select_stmt: str = f"SELECT ID FROM bulk \
        WHERE FILENAME='file.out' and COMPANY='12345678'"
    cursor.execute(select_stmt)

end_time = time.time()
print(f"Raw query execution time {end_time - start_time}")

Results

P.S.: The table contains 6M records, and as an index with the 2 query fields, and the ID field as additional data field.

$./test.py
Parameterized query execution time 101.35617518424988
Raw query execution time 0.11975717544555664
gordthompson commented 1 year ago

( probably related: #376 )

What do you see when you run this query?

SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = 'bulk' AND COLUMN_NAME IN ('FILENAME', 'COMPANY')
Pivert commented 1 year ago

db BULK FILENAME varchar 50 db BULK COMPANY varchar 50

I confirm that encoding the strings in utf-8 as a workaround mentioned in #376 works like a charm.

Back to super performances. Thanks for the workaround !

gordthompson commented 1 year ago

Note that encoding the string parameters as UTF-8 can cause queries to fail for varchar() columns that contain accented characters and other non-ASCII characters. For example:

# example data
sql = "SELECT ID, COMPANY FROM [bulk]"
results = crsr.execute(sql).fetchall()
print(results)  # [(1, 'fermé')]

company = results[0][1]

# string parameter encoded as UTF-8
sql = "SELECT ID FROM [bulk] WHERE COMPANY=?"
results = crsr.execute(sql, company.encode("utf-8")).fetchall()
print(results)  # []  ←  not found!

Better workarounds are to use CAST() in T-SQL …

sql = "SELECT ID FROM [bulk] WHERE COMPANY=CAST(? AS varchar(10))"
results = crsr.execute(sql, company).fetchall()
print(results)  # [(1, )]

… or setinputsizes()

sql = "SELECT ID FROM [bulk] WHERE COMPANY=?"
crsr.setinputsizes([(pyodbc.SQL_VARCHAR, 10)])
results = crsr.execute(sql, company).fetchall()
print(results)  # [(1, )]