tmolomby / pyodbc

Automatically exported from code.google.com/p/pyodbc
MIT No Attribution
0 stars 0 forks source link

Missing rows in select when fetching a long string (XML field) #296

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
What is the expected output? What do you see instead?
There's a select from an MS SQL Server database that is returning no rows when 
I include an XML field. 
For example, if I do:
   curs.execute("select Starttidspunkt from Koersel where SerieNr='12485001'")
then I get 1 row with a timestamp value.
But if I do:
   curs.execute("select CAST(Pdata AS VARBINARY(max)) from Koersel where SerieNr='12485001'")
then I get 0 rows, despite using the exact same where-criteria. The Pdata field 
is an XML field and the cast turns it into some 30KB of text.

What version of the product are you using? On what operating system?
3.0.6 on XP,Win7.

Please provide any additional information below.
Studying the 3.0.6 sources, I think I've found the cause: The "return 0" at the 
end of the GetDataString function in getdata.cpp.

The issue is with the handling of indeterminate length text. It looks like 
GetDataString starts out with a 1024 bytes buffer, and then loops up to 10 
times, adding 2048 bytes each time. The last allocation isn't used, so that 
makes the effective maximum size 19KiB.  The for loop exhausts, and the "return 
0" at the end convinces Cursor_fetchlist to break, but since there is no 
exception raised, this results in rows missing instead of an error message.

Attached a patch to use exponential instead of linear growth of the buffer, 
allowing for strings of up to 2MiB, and an explanatory MemoryError for when 
even that's not enough.  

It's untested, alas, since I can't build pyodbc (seems to be MSVC++-only on 
MSWin). I'd be very grateful for a build with this patch included ASAP.

Original issue reported on code.google.com by and...@jmunch.dk on 1 Nov 2012 at 8:42

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I have also hit this issue and I've attached an updated/untested patch which 
increases the GetDataString default buffer size to 4kB and changes its 
reallocation algorithm.  Optionally, the GetDataString default buffer size can 
be changed via the new pyodbc connect keyword getdata_str_size.  The supported 
range for the getdata_str_size keyword is 1kB - 128kB and the patch enforces 
this range.  If needed, the GetDataString buffer will be reallocated by 
doubling (up to 10 times)
the previously used buffer size.  The new reallocation algorithm allows the 
GetDataString default buffer size to grow to a maximum of 4kB * 2^10 or 4MB.  
In summary, this patch allows the pyodbc user to control the maximum 
GetDataString buffer size needed for their applications (i.e. within the range 
of 1MB-128MB).

I will endeavor to start testing this patch in the near future.

I've also attached Windows binaries for python 2.7 and python 3.3, if anyone 
wants to help test this patch.

Original comment by pe...@psantoro.net on 22 Apr 2013 at 8:19

Attachments:

GoogleCodeExporter commented 9 years ago
I've attached an updated/tested patch which adds a few TRACE statements and 
renames a local variable.  I tested this patch on Windows XP/SP3 with Python 
2.74 and Python 3.3.1.  Here is an updated summary of this patch:

This patch increases the GetDataString default buffer size/increment to 4kB and 
changes its reallocation algorithm.  Optionally, the GetDataString default 
buffer size/increment can be changed via the new pyodbc connect keyword 
getdata_str_size.  The supported range for the getdata_str_size keyword is 1kB 
- 128kB and the patch enforces this range.  If needed, the GetDataString buffer 
will be reallocated by doubling (up to 10 times) the previously used buffer 
size increment.  In summary, this patch allows the pyodbc user to control the 
maximum GetDataString buffer size needed for their applications within the 
range of 2047kB - 262016kB.

I've also attached updated Windows binaries for Python 2.7 and Python 3.3, if 
anyone wants to further test this patch.

Original comment by pe...@psantoro.net on 24 Apr 2013 at 4:01

Attachments:

GoogleCodeExporter commented 9 years ago
I've updated my patch for the pyodbc 3.0.7 release.  I've also included a few 
binaries for anyone who might need them.  The binaries were built using Python 
2.7.5 and Python 3.3.2.

Original comment by pe...@psantoro.net on 17 Jul 2013 at 11:47

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I was having the same issue. Specifically this was a problem using Native 
Client 11 and not Native Client 10. I've gone back to using Native Client 10 as 
a work around but this patch also fixes the problem.

Original comment by lamba...@gmail.com on 24 Jan 2015 at 7:56