reference count encoding in the connect function

FlorianSchwendinger commented 7 months ago

Python: 3.12.3
pyodbc: 5.1.0
OS: Windows
DB: Microsoft SQL Server
driver: ODBC Driver 17 for SQL Server

The following gives a minimal example which shows the issue.

Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import pyodbc
>>> encoding = 'utf-16le'
>>> sys.getrefcount(encoding)
2
>>> constr = 'DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=test;UID=florian;PWD=abc1234'
>>> con = pyodbc.connect(constr, encoding = encoding)
>>> sys.getrefcount(encoding)
2138597108913

Hope this helps.

FlorianSchwendinger commented 7 months ago

As far as I see the following issues are related.

https://github.com/mkleehammer/pyodbc/issues/1117 Here the user uses encoding in his connection string, therefore at the second attempt to create a connection fails since encoding seams to be already freed.
https://stackoverflow.com/questions/70825458/sqlalchemy-pyodbc-connect-fails-without-setting-exception Seams to be the same issue.

gordthompson commented 7 months ago

I am unable to reproduce the issue on Ubuntu 20.04 with msodbcsql18 and pyodbc 5.1.0 under Python 3.8 using the official UTF-16LE alias in the Python docs:

import pyodbc

print(pyodbc.version)  # 5.1.0

connection_string = (
    "DRIVER=ODBC Driver 18 for SQL Server;"
    "SERVER=192.168.0.199;"
    "UID=scott;PWD=tiger^5HHH;"
    "DATABASE=test;"
    "TrustServerCertificate=yes;"
)

enc = "UTF-16LE"
print(sys.getrefcount(enc))  # 4
cnxn = pyodbc.connect(connection_string, encoding=enc)
print(sys.getrefcount(enc))  # 3

However, the docs for encoding say:

The ODBC connection string must be sent to the driver as a byte sequence, hence the Python string must first be encoded using the named encoding.

That is clearly not the case for msodbcsql, so just don't use it.

FlorianSchwendinger commented 7 months ago

Thank you for your reply.

When looking at your example I see,

print(sys.getrefcount(enc))  # 4
cnxn = pyodbc.connect(connection_string, encoding=enc)
print(sys.getrefcount(enc))  # 3

does this mean the reference count goes down from 4 to 3? This would mean if you close the connection and reconnect 3 times your reference count should be also down to 0.

For Debian 12 I get

import sys
import pyodbc
import sqlite3

con = sqlite3.connect("abc.db")
cur = con.cursor()
cur.execute("CREATE TABLE simple(example TEXT)")
con.close()

print(pyodbc.version)
#py> 5.1.0

connection_string = "Driver=SQLite3;Database=./abc.db"
encoding = "UTF-16LE"

sys.getrefcount(encoding)
#py> 2
cnxn = pyodbc.connect(connection_string, encoding=encoding)
sys.getrefcount(encoding)
#py> Segmentation fault

a similar result.

To the specification of the encoding argument, I used the same as defined in the function header in https://github.com/mkleehammer/pyodbc/blob/master/src/pyodbc.pyi line 982, defined as

def connect(connstring: Optional[str] = None,
            /, *,  # only positional parameters before, only named parameters after
            autocommit: bool = False,
            encoding: str = 'utf-16le',
            readonly: bool = False,
            timeout: int = 0,
            attrs_before: Optional[Dict[int, Any]] = None,
            **kwargs: Any) -> Connection:

I still believe that the original issue described here https://github.com/mkleehammer/pyodbc/issues/1117#issue-1436038058 is related to the reference count of the encoding.

gordthompson commented 7 months ago

This would mean if you close the connection and reconnect 3 times your reference count should be also down to 0.

It doesn't make it to 0; it crashes after 1. When I run my test code in PyCharm I get

enc = "UTF-16LE"
print(sys.getrefcount(enc))
# 4
cnxn = pyodbc.connect(connection_string, encoding=enc)
print(sys.getrefcount(enc))
# 3
cnxn.close()
cnxn = pyodbc.connect(connection_string, encoding=enc)
print(sys.getrefcount(enc))
# 2
cnxn.close()
cnxn = pyodbc.connect(connection_string, encoding=enc)
print(sys.getrefcount(enc))
# 1
#
# Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

and from a command prompt I get

(venv) gord@gord-dv7:~/PycharmProjects/pyodbc_demo$ python main.py 
5.1.0
4
3
2
1
Segmentation fault (core dumped)

mkleehammer commented 7 months ago

Holy smokes - that's a serious issue. Thanks. I just pushed a fix for this and will make a release tonight.

mkleehammer / pyodbc

reference count encoding in the connect function #1343