mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.95k stars 563 forks source link

"[ODBC Driver 18 for SQL Server]Invalid precision value (0) (SQLBindParameter)" with string parameter longer than 536,870,911characters #1239

Closed hb2638 closed 1 year ago

hb2638 commented 1 year ago

Please first make sure you have looked at:

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit: DOTNET.SQL.LOG PYODBC.SQL.LOG

Issue

When uploading large text (about 500MB in my case), pyodbc fails with pyodbc.Error: ('HY104', '[HY104] [Microsoft][ODBC Driver 18 for SQL Server]Invalid precision value (0) (SQLBindParameter)')

Below is python code that replicates the problem as well as equivalent c# code that doesn't replicate the problem. I've attached the ODBC trace logs for both. The expected behavior is it shouldn't fail.

import pyodbc
conn_str = "Driver=ODBC Driver 18 for SQL Server;Server=localhost;TrustServerCertificate=yes;Trusted_Connection=yes;MARS_Connection=yes;ConnectRetryCount=30"
conn = pyodbc.connect(conn_str, autocommit=True)
conn.execute("DECLARE @FOO NVARCHAR(MAX)=?", ["X" * 1000000000])
var conn = new OdbcConnection("Driver=ODBC Driver 18 for SQL Server;Server=localhost;TrustServerCertificate=yes;Trusted_Connection=yes;MARS_Connection=yes;ConnectRetryCount=30");
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandTimeout = 0;
cmd.CommandText = "DECLARE @FOO NVARCHAR(MAX)=?";
var expected = new string('x', 1000000000);
cmd.Parameters.Add(new OdbcParameter("?", expected));
cmd.ExecuteNonQuery();
gordthompson commented 1 year ago

Can you try adding ;LongAsMax=yes to your connection string and see if that helps?

hb2638 commented 1 year ago

Can you try adding ;LongAsMax=yes to your connection string and see if that helps?

That's a good idea.. Unfortunately it didn't work, I also upgraded to 18.2.2 (shows as 2018.182.02.01 on windows) and it has the same problem.

gordthompson commented 1 year ago

Your repro code gives the same error for me and I have tried a couple of tricks to avoid it (alas, unsuccessfully), but before I go too far down this rabbit hole I do have one question.

["X" * 1000000000]

or, to re-phrase

["X" * 1_000_000_000]

is 1 billion Unicode characters. SQL Server normally treats one Unicode character as 2 bytes, so that would be 2 billion bytes, or 1.9 GiB, not the "500 MB" you mention in your original post. Is your test case purposely 4 times larger than your usage case?

I just tried your code with ["X" * 250_000_000] (approximately 500 MB) and it did not fail. I note that the largest signed 32-bit integer is 2,147,483,647, or 2.0 GiB, so I'm wondering if we're hitting some sort of internal limit that causes pyodbc to get confused.

hb2638 commented 1 year ago

Your repro code gives the same error for me and I have tried a couple of tricks to avoid it (alas, unsuccessfully), but before I go too far down this rabbit hole I do have one question.

["X" * 1000000000]

or, to re-phrase

["X" * 1_000_000_000]

is 1 billion Unicode characters. SQL Server normally treats one Unicode character as 2 bytes, so that would be 2 billion bytes, or 1.9 GiB, not the "500 MB" you mention in your original post. Is your test case purposely 4 times larger than your usage case?

I just tried your code with ["X" * 250_000_000] (approximately 500 MB) and it did not fail. I note that the largest signed 32-bit integer is 2,147,483,647, or 2.0 GiB, so I'm wondering if we're hitting some sort of internal limit that causes pyodbc to get confused.

2GB is within the range of what's allowed for NVARCHAR(MAX) which is why the .NET version of the code works.

hb2638 commented 1 year ago

Your repro code gives the same error for me and I have tried a couple of tricks to avoid it (alas, unsuccessfully), but before I go too far down this rabbit hole I do have one question.

["X" * 1000000000]

or, to re-phrase

["X" * 1_000_000_000]

is 1 billion Unicode characters. SQL Server normally treats one Unicode character as 2 bytes, so that would be 2 billion bytes, or 1.9 GiB, not the "500 MB" you mention in your original post. Is your test case purposely 4 times larger than your usage case?

I just tried your code with ["X" * 250_000_000] (approximately 500 MB) and it did not fail. I note that the largest signed 32-bit integer is 2,147,483,647, or 2.0 GiB, so I'm wondering if we're hitting some sort of internal limit that causes pyodbc to get confused.

The string in the code that fauled was 538,173,956 characters (UTF-8). We're uploading JSON as NVARCHAR(MAX) as an alternative to TPV.

A 536,870,911 character length string works but a 536,870,912 character length string does not work.

And good catch on me missing the conversion of chars to btyes... so it's about 1GB of text that causes it to fail.

gordthompson commented 1 year ago

A 536,870,911 character length string works but a 536,870,912 character length string does not work.

... so it's about 1GB of text that causes it to fail.

That's exactly the point where it exceeds 1 GiB, assuming 2 bytes per character. However, UTF-16 encoding can use 4 bytes (surrogate pairs) for characters outside the Basic Multilingual Plane (e.g., emoji) so pyodbc may be "playing it safe" and allowing for 4 bytes per character, which would just hit the 2 GiB limit.

gordthompson commented 1 year ago

pyodbc does not cause the error when using FreeTDS ODBC 1.3.6 and unixODBC 2.3.9 on Linux.

import pyodbc

cnxn = pyodbc.connect(
    "Driver=FreeTDS;"
    "Server=192.168.0.199;Port=1433;"
    "UID=scott;PWD=tiger^5HHH;"
    "Database=test;"
)
crsr = cnxn.cursor()
crsr.execute(
    "DECLARE @FOO NVARCHAR(MAX) = ?",
    ["X" * 600_000_000]
)
[ODBC][3312][1689342866.647757][SQLBindParameter.c][217]
        Entry:
            Statement = 0x56273645f8c0
            Param Number = 1
            Param Type = 1
            C Type = -8 SQL_C_WCHAR
            SQL Type = -10 
            Col Def = 1200000000
            Scale = 0
            Rgb Value = 0x562736462d20
            Value Max = 8
            StrLen Or Ind = 0x562736462d48
[ODBC][3312][1689342866.689437][SQLBindParameter.c][434]
        Exit:[SQL_SUCCESS]

FreeTDS.log.zip

hb2638 commented 1 year ago

pyodbc does not cause the error when using FreeTDS ODBC 1.3.6 and unixODBC 2.3.9 on Linux.

import pyodbc

cnxn = pyodbc.connect(
    "Driver=FreeTDS;"
    "Server=192.168.0.199;Port=1433;"
    "UID=scott;PWD=tiger^5HHH;"
    "Database=test;"
)
crsr = cnxn.cursor()
crsr.execute(
    "DECLARE @FOO NVARCHAR(MAX) = ?",
    ["X" * 600_000_000]
)
[ODBC][3312][1689342866.647757][SQLBindParameter.c][217]
        Entry:
            Statement = 0x56273645f8c0
            Param Number = 1
            Param Type = 1
            C Type = -8 SQL_C_WCHAR
            SQL Type = -10 
            Col Def = 1200000000
            Scale = 0
            Rgb Value = 0x562736462d20
            Value Max = 8
            StrLen Or Ind = 0x562736462d48
[ODBC][3312][1689342866.689437][SQLBindParameter.c][434]
        Exit:[SQL_SUCCESS]

FreeTDS.log.zip

Thx. This inspired me to check the SQLBindParameter for the logs I attached and I see the SQLULEN and SQLLEN are different. SQLULEN: PYODBC sends 2,000,000,000 and Microsoft's C# implementation sends 1,000,000,000. SQLLEN: PYODBC sends 8 and Microsoft's C# implementation sends 2,000,000,002.

Why the differences???


ConsoleApp1 a68-36d4 ENTER SQLBindParameter HSTMT 0x0000015BFD991CC0 UWORD 1 SWORD 1 SWORD -8 SWORD -10 SQLULEN 1000000000 SWORD 0 PTR 0x0000015C77FA8048 SQLLEN 2000000002 SQLLEN * 0x0000015C77FA8040

ConsoleApp1 a68-36d4 EXIT SQLBindParameter with return code 0 (SQL_SUCCESS)


test_pyodbc 8ba4-3970 ENTER SQLBindParameter HSTMT 0x0000019BD6920F60 UWORD 1 SWORD 1 SWORD -8 SWORD -10 SQLULEN 2000000000 SWORD 0 PTR 0x0000019BD22A7930 SQLLEN 8 SQLLEN * 0x0000019BD22A7958

test_pyodbc 8ba4-3970 EXIT SQLBindParameter with return code -1 (SQL_ERROR)

v-chojas commented 1 year ago

Ignore the second-to-last parameter; that's only used for output parameters.

But a column size of 2000000000 is certainly far beyond the limit for ntext, which is what SQL_WLONGVARCHAR Is mapped to: https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver16

The fact that it is twice as large as it should be suggests another bytes vs characters confusion; you may try to use setinputsizes() to override the column size.

gordthompson commented 1 year ago

Yep,

crsr.setinputsizes([(pyodbc.SQL_WVARCHAR, 0)])
crsr.execute(
    "DECLARE @FOO NVARCHAR(MAX)=?", 
    ["X" * 600_000_000]
)

seems to work. Note that's SQL_WVARCHAR, not SQL_WLONGVARCHAR.

hb2638 commented 1 year ago

Yep,

crsr.setinputsizes([(pyodbc.SQL_WVARCHAR, 0)])
crsr.execute(
    "DECLARE @FOO NVARCHAR(MAX)=?", 
    ["X" * 600_000_000]
)

seems to work. Note that's SQL_WVARCHAR, not SQL_WLONGVARCHAR.

Thanks, that fixes it. Is there any reason why I shouldn't use pyodbc.SQL_WVARCHAR for all strings I send to Microsoft SQL Server via pyodbc?