postgresql-interfaces / psqlodbc

Other
16 stars 14 forks source link

Memory Corruption using version 16.00.0005 #47

Closed siga0984 closed 3 weeks ago

siga0984 commented 1 month ago

After upgrade from ODBC driver 15 to 16, in a c/cpp db gateway program , when the program closes all opened statements, windows CRT crashes with a heap corruption exception, allways when I call SQLFreeHandle of the same ( trivial ) statement. This is the clal stack :

psqlodbc30a.dll!TI_ClearObject(TABLE_INFO * ti) Line 66 C
[Inline Frame] psqlodbc30a.dll!TI_Destructor(TABLE_INFO * * ti, int count) Line 44  C
psqlodbc30a.dll!SC_initialize_cols_info(StatementClass_ * stmt, int DCdestroy, int parseReset) Line 524 C
psqlodbc30a.dll!SC_Destructor(StatementClass_ * self) Line 514  C
psqlodbc30a.dll!PGAPI_FreeStmt(void * hstmt, unsigned short fOption) Line 334   C
psqlodbc30a.dll!SQLFreeHandle(short HandleType, void * Handle) Line 273 C

After turn on all odbc logs, the last instructions show some reference count below zero, but the previous count shows positive.. see TI_ClearObject 0000021A4F80E580 refcnt

[23924-25.444] qresult.c[QR_Destructor]354: entering
[23924-25.444] qresult.c[QR_close_result]265: entering
[23924-25.444] qresult.c[QR_free_memory]489: entering fcount=0
[23924-25.444] qresult.c[QR_free_memory]587: leaving
[23924-25.444] qresult.c[QR_close_result]329: leaving
[23924-25.444] qresult.c[QR_Destructor]358: leaving
**[23924-25.444]descriptor[TI_ClearObject]58: !!!refcnt 0000021A4F80E580:2 -> 1**
[23924-25.444] qresult.c[QR_Destructor]354: entering
[23924-25.444] qresult.c[QR_close_result]265: entering
[23924-25.444] qresult.c[QR_free_memory]489: entering fcount=8
[23924-25.444] qresult.c[QR_free_memory]587: leaving
[23924-25.444] qresult.c[QR_close_result]329: leaving
[23924-25.444] qresult.c[QR_Destructor]358: leaving
[23924-25.444]    bind.c[reset_a_column_binding]772: entering ... self=0000021A4FA34820, bindings_allocated=1, icol=1
[23924-25.444]    bind.c[APD_free_params]643: entering self=0000021A4FA34900
[23924-25.444]    bind.c[IPD_free_params]698: entering self=0000021A4FA34970
[23924-25.444]    bind.c[PDATA_free_params]663: entering self=0000021A4FA34A68
[23924-25.444]statement.[SC_Destructor]536: leaving
[23924-25.444]odbcapi30.[SQLFreeHandle]250: Entering
[23924-25.444]statement.[PGAPI_FreeStmt]248: entering...hstmt=0000021A4F7A4050, fOption=1
[23924-25.444] qresult.c[QR_Destructor]354: entering
[23924-25.444]statement.[SC_init_Result]546: leaving(0000021A4F7A4050)
[23924-25.444]statement.[SC_Destructor]488: entering self=0000021A4F7A4050, self->result=0000000000000000, self->hdbc=0000021A4F6C5700
[23924-25.444]connection[CC_send_query_append]1816: conn=0000021A4F6C5700, query='DEALLOCATE "_PLAN0000021A4F7A4050"'
[23924-25.444]connection[CC_send_query_append]1898: query_len=34
[23924-25.444]connection[CC_send_query_append]1942: [QLOG]PQsendQuery: 0000021A4F6BB4F0 'DEALLOCATE "_PLAN0000021A4F7A4050"'
[23924-25.444] qresult.c[QR_Constructor]183: entering
[23924-25.444] qresult.c[QR_Constructor]252: leaving 0000021A5093DBF0
[23924-25.444]connection[CC_send_query_append]1984: [QLOG]  ok: - 'C' - DEALLOCATE
[23924-25.444]connection[CC_send_query_append]2000:  setting cmdbuffer = 'DEALLOCATE'
[23924-25.444]connection[CC_send_query_append]2084:  returning res = 0000021A5093DBF0
[23924-25.444] qresult.c[QR_Destructor]354: entering
[23924-25.444] qresult.c[QR_close_result]265: entering
[23924-25.444] qresult.c[QR_free_memory]489: entering fcount=0
[23924-25.444] qresult.c[QR_free_memory]587: leaving
[23924-25.444] qresult.c[QR_close_result]329: leaving
[23924-25.444] qresult.c[QR_Destructor]358: leaving
**[23924-25.444]descriptor[TI_ClearObject]58: !!!refcnt 0000021A4F80E580:0 -> -1**

And this is the last line on file ... I don't know is this is the proper channel to report or ask help for something like this ... The application runs the same (huge) sequence of instructions with all previous versions of odbc driver ... any tips, advice, I will appreciate !!

davecramer commented 1 month ago

This is the proper channel. Thanks for the report.

Dave

davecramer commented 1 month ago

@progmachine This looks like something you might be more familiar with. Do you have any cycles to work on this ?

siga0984 commented 1 month ago

This issue does not happen with 16.00.0004 release. And, should be something related to the last update ... Because the object being destroyed is a COL_INFO object .

Restored reference counting lifetime managment of COL_INFO objects. -> Retrieved from REL-16_00_0005 release comments

davecramer commented 1 month ago

@siga0984 I imagine it is related to that PR which is why I asked @progmachine to have a look

progmachine commented 1 month ago

I'll do some research. Looks like something wrong with refcounting... (

siga0984 commented 1 month ago

Same issue occurrs in Linux -- Tested on Ubuntu20 psqlodbc 16 through Unixodbc, core dumped on same stack...

The attached ZIP file contains postgres odbc logs, generated by psqlodbc on Windows. I hope this is helpful.

logs.zip

progmachine commented 1 month ago

Spent entire evening yesterday investigating problem. Found only one place where refcnt can broke. Also found one place where COL_INFO objects may leak. In a minute i'll prepare pull request for this. It will be fine if @siga0984 can test this fix and confirm if bug is gone away.

siga0984 commented 4 weeks ago

Spent entire evening yesterday investigating problem. Found only one place where refcnt can broke. Also found one place where COL_INFO objects may leak. In a minute i'll prepare pull request for this. It will be fine if @siga0984 can test this fix and confirm if bug is gone away.

Yes, I can !! 👍 Just let me know where i can download a beta / release candidate psqlodbc30a.dll to test !!

davecramer commented 4 weeks ago

@siga0984 we don't create release candidates. As you are able to test this on unix you can just get the PR and build from there. Is that possible, Otherwise I can try to create a release candidate

siga0984 commented 4 weeks ago

Yep, I can do it ... I 'll read some how-to documentation 😃

siga0984 commented 4 weeks ago

Yep, I can do it ... I 'll read some how-to documentation 😃

I've sucessfully clone psqlodbc project, and pull @progmachine code , and build and configure and test ... AND YES, WORKS FINE 😄 It's completely solved !!!

ODBC Driver Name...........: psqlodbca.so
ODBC Driver Version .......: 17.00.0000
UNIX ODBC Version..........: 2.3.6

Thank you ALL very much @davecramer @progmachine and the development team !!!!

siga0984 commented 4 weeks ago

Gentleman ... I'm sorry to bring some bad news ...

My primary test runs fine ... BUT ... I have two different routines that cause the heap corruption ... and i have tested the first one... NOW i have tested the second routine ... And heap corruption occurs ...

See the attachment files ( mylog ) , at the end of file reference count is negative 😢 And the stack is the same. logs-16.00.0006.zip

progmachine commented 4 weeks ago

@siga0984 are you sure that you using correct psqlodbc dll version? Because refcnt is decrementing only in three places within entire codebase, and all of them should be logged in 16.0.6, but i can not see in your log, where refcnt for object 0x00000171A90E8DA0 is decremented to 0.

progmachine commented 4 weeks ago

Now i see, in the beginning of the log, version is written... :-/ I think we need more collaboration for testing. @siga0984 can you run your tests on linux? Because i have no instrumentation for building psqlodbc on windows. Tomorrow i'll make new branch in my fork, and add precise logging, anywhere refcnt is modified. I hope it will help to understand what is going on.

davecramer commented 4 weeks ago

@progmachine thanks for this. I should remove the release.

siga0984 commented 4 weeks ago

@siga0984 are you sure that you using correct psqlodbc dll version? Because refcnt is decrementing only in three places within entire codebase, and all of them should be logged in 16.0.6, but i can not see in your log, where refcnt for object 0x00000171A90E8DA0 is decremented to 0.

Yes, I do. You can confirm the version on mylog ... "[37520-0.003]connection[CC_initial_log]1036: [QLOG]Driver Version='16.00.0006,Sep 27 2024' linking 1941 dynamic Multithread library"

Meanwhile, I'm still doing more test to obtain as much as I can get of data and a shorter sequence os operations to reproduce the issue.

siga0984 commented 4 weeks ago

Now i see, in the beginning of the log, version is written... :-/ I think we need more collaboration for testing. @siga0984 can you run your tests on linux? Because i have no instrumentation for building psqlodbc on windows. Tomorrow i'll make new branch in my fork, and add precise logging, anywhere refcnt is modified. I hope it will help to understand what is going on.

Let me know when you do it, and I'll download your branch , build and test it on linux 😛

davecramer commented 4 weeks ago

@siga0984 what flavour of linux. I'm trying to build it on ubuntu and running into issues

checking for odbc_config... no
configure: error: odbc_config not found (required for unixODBC build)
progmachine commented 4 weeks ago

ubuntu is troublesome when it relates to unixodbc, because they renamed/moved things. I use script, found somewhere in wild internet. Place it in /usr/bin odbc_config.gz

siga0984 commented 4 weeks ago

@progmachine @davecramer

Additional information : For some specific statements, for better performance, I bind one parameter, and bind the return columns, and after execute, I close the cursor with SQLFreeStmt( hstmt, SQL_CLOSE -- option 1 ); , and later i just change the binded parameter content ( same addr, same size ) , and Execute the statement again. One of those statements crashes on FreeHandle(). When I change my app to allways drop an create a new statement, everything works fine 👍

siga0984 commented 4 weeks ago

@progmachine @davecramer

Maybe I found where is the catch ... After prepare two statements with the same SQL expression, leave both opened , DROP any table, and close the statements, I got the double delete error !!!

Seems like it's related to CC_clear_col_info() , called on DROP TABLE 🎸

progmachine commented 4 weeks ago

Seems like it's related to CC_clear_col_info() , called on DROP TABLE

I found it =) It is between descriptor.c:60 and descriptor.c:62 - line 61 erases refcnt. Tomorrow i'll make PR for this. For now i am going to sleep.

progmachine commented 4 weeks ago

@siga0984 can you test branch bugfix_47-refcnt_erase before i make pull request?

siga0984 commented 3 weeks ago

@siga0984 can you test branch bugfix_47-refcnt_erase before i make pull request?

@progmachine Yep, immediately 👍

siga0984 commented 3 weeks ago

@progmachine Builded , and tested -- With all routines that triggers the issue , and it's working FINE 😄

davecramer commented 3 weeks ago

Ok, I've pushed tag REL-17_00_0001. This will release REL-17_00_0001

Thank you both @siga0984 and @progmachine for this!