lurcher / unixODBC

The unixODBC Project goals are to develop and promote unixODBC to be the definitive standard for ODBC on non MS Windows platforms.
GNU Lesser General Public License v2.1
94 stars 51 forks source link

Program crashes with connection pooling enabled #141

Open Tagl opened 1 year ago

Tagl commented 1 year ago

I have connection pooling enabled with the Microsoft ODBC Driver for SQL Server. I have a multithreaded application where each thread except the main thread opens and closes connections in sequence. It seems that the same pooled connection is being closed twice in the pool search. This causes the program to crash in close_pooled_connection with the error displayed as: double free detected in tcache 2 Without connection pooling, there is no issue.

I'm using C++ with another library, SQLAPI++, but following the problem led me to the closed_pooled_connection function here. I have tried isolating the connection code within my program with a mutex, since I thought it was a thread safety issue originally, but the issue seems to occur even with isolation.

My odbcinst.ini file:

[ODBC]
Pooling=Yes

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
UsageCount=1
CPTimeout=120
#0  0x00007ffff6a8e88d in ?? () from /opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.2.1
No symbol table info available.
#1  0x00007ffff6a9410f in SQLFreeHandle () from /opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.2.1
No symbol table info available.
#2  0x00007ffff6eda700 in close_pooled_connection (ptr=0x7fffec036a40) at SQLConnect.c:2924
        ret = 0
        conn = 0x7fffec036a68
#3  0x00007ffff6edaefa in search_for_pool (connection=0x7fffa40008e0, server_name=0x0, name_length1=0, user_name=0x0, name_length2=0, authentication=0x0, name_length3=0,
    connect_string=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", connect_string_length=-3, pooh=0x7fff657cb018, retrying=0) at SQLConnect.c:3290
        ret = -11168
        conn_match = 1
        current_time = 1684236577
        dead = 0
        ptrh = 0x7ffff0007010
        prevh = 0x0
        match_head = 0x7ffff0007010
        ptre = 0x7fffec036a40
        preve = 0x7fffe0036a80
        has_checked = 0
#4  0x00007ffff6ee2166 in SQLDriverConnect (hdbc=0x7fffa40008e0, hwnd=0x0, conn_str_in=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", len_conn_str_in=-3,
    conn_str_out=0x7fff657d1810 "", conn_str_out_max=1024, ptr_conn_str_out=0x7fff657d17cc, driver_completion=0) at SQLDriverConnect.c:905
        retpool = 0
        retrying = 0
        wait_begin = 1684236577
        connection = 0x7fffa40008e0
        con_struct = {count = 5, list = 0x7fffa405c5b0}
        driver = 0x0
        dsn = 0x0
        filedsn = 0x0
        tsavefile = 0x0
        savefile = '\000' <repeats 1000 times>
        lib_name = '\000' <repeats 1000 times>
        driver_name = '\000' <repeats 1000 times>
        ret_from_connect = 0
        s1 = '\000' <repeats 2047 times>
        local_conn_str_in = '\000' <repeats 2047 times>
        local_out_conection = '\000' <repeats 2047 times>
        save_filedsn = 0x0
        warnings = 0
        pooh = 0x0
#5  0x00007ffff6ee1a16 in SQLDriverConnectA (hdbc=0x7fffa40008e0, hwnd=0x0, conn_str_in=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", len_conn_str_in=-3,
    conn_str_out=0x7fff657d1810 "", conn_str_out_max=1024, ptr_conn_str_out=0x7fff657d17cc, driver_completion=0) at SQLDriverConnect.c:669
No locals.
#6  0x000055555557af8d in IodbcConnection::Connect (this=0x7fffa40337c0, sDBString=..., sUserID=..., sPassword=..., fHandler=0x0) at odbcClient.cpp:526
        s = {m_pchData = 0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****"}
        OutConnectionString = '\000' <repeats 664 times>...
        StringLength2 = 0
        sOption = {m_pchData = 0x5555555983d0 <_saInitData+48> ""}
        uDriverCompletion = 0
        __PRETTY_FUNCTION__ = "virtual void IodbcConnection::Connect(const SAString&, const SAString&, const SAString&, saConnectionHandler_t)"
        hdbc = 0x7fffa40008e0
#7  0x000055555555f90c in SAConnection::Connect (this=0x7fff657d1cf0, sDBString=..., sUserID=..., sPassword=..., eSAClient=SA_ODBC_Client, fHandler=0x0) at SQLAPI.cpp:1622
        pISAConnection = 0x7fffa40337c0
#8  0x0000555555559ca0 in connect_task (pool_id=10, thread_id=208) at sqltest.cpp:34
        random_wait = {__r = 140735945232960}
        con = {<SAOptions> = {_vptr.SAOptions = 0x555555597060 <vtable for SAConnection+16>, m_pOptions = 0x7fffa405c5e0}, m_pSAPI = 0x7ffff0000d10, m_pCommandsMutex = 0x7fffa4033800, m_pCommands = 0x0,
          m_eIsolationLevel = SA_LevelUnknown, m_eAutoCommit = SA_AutoCommitUnknown, nReserved = 0}
        j = 0
#9  0x000055555555c363 in std::__invoke_impl<void, void (*)(int, int), int, int> (__f=@0x5555555b8a10: 0x555555559b69 <connect_task(int, int)>) at /usr/include/c++/11/bits/invoke.h:61
No locals.
#10 0x000055555555c2a0 in std::__invoke<void (*)(int, int), int, int> (__fn=@0x5555555b8a10: 0x555555559b69 <connect_task(int, int)>) at /usr/include/c++/11/bits/invoke.h:96
No locals.
#11 0x000055555555c1bf in std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> >::_M_invoke<0ul, 1ul, 2ul> (this=0x5555555b8a08) at /usr/include/c++/11/bits/std_thread.h:253
No locals.
#12 0x000055555555c158 in std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> >::operator() (this=0x5555555b8a08) at /usr/include/c++/11/bits/std_thread.h:260
No locals.
#13 0x000055555555c138 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> > >::_M_run (this=0x5555555b8a00) at /usr/include/c++/11/bits/std_thread.h:211
No locals.
--Type <RET> for more, q to quit, c to continue without paging--c
#14 0x00007ffff7e5d2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#15 0x00007ffff7b06b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346848, 1118859342958286079, 140734896088640, 0, 140737348921424, 140737488347200, -1119084726718049025, -1118841299216771841}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#16 0x00007ffff7b98a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I've tried unixODBC versions 2.3.9, 2.3.11 and 2.3.12pre, issue is present in all of them. I'm on Ubuntu 22.04, but tested on RHEL9 as well, issue present on both.

lurcher commented 1 year ago

Do you have your sample code that I can try to help track down the problem?

lurcher commented 1 year ago

On 16/05/2023 14:16, Arnar Bjarni Arnarson wrote:

I have connection pooling enabled with the Microsoft ODBC Driver for SQL Server. I have a multithreaded application where each thread except the main thread opens and closes connections in sequence. It seems that the same pooled connection is being closed twice in the pool search. This causes the program to crash in close_pooled_connection with the error displayed as: double free detected in tcache 2 Without connection pooling, there is no issue.

I'm using C++ with another library, SQLAPI++, but following the problem lead me to the closed_pooled_connection function here. I have tried isolating the connection code within my program with a mutex, since I thought it was a thread safety issue originally, but the issue seems to occur even with isolation.

My odbcinst.ini file:

Do you have a way I can reproduce the problem then I will see if I can find a fix for it?

Tagl commented 1 year ago

Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection. It also uses the library I mentioned before. I've run something similar to this (tweaked number of thread pools / threads / iterations per thread / added mutexes) a hundred times and almost every single time it crashes due to this error within 15 minutes. I have not noticed any particular pattern regarding when it crashes, seems rather random, but it's always in the same code path. I have closed source software in which I discovered this where it generally crashes after around 5 minutes.

#include <SQLAPI.h> // main SQLAPI++ header

#include <algorithm>
#include <chrono>
#include <iostream>
#include <thread>
#include <vector>

using namespace std;

void connect_task(int pool_id, int thread_id) {
    for (int j = 0; j < 1; j++) {
        SAConnection con;
        try {
            con.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);

            if (con.isConnected()) {
                con.Disconnect();
            }
        }
        catch (SAException& x)
        {
            try
            {
                con.Rollback();
            }
            catch (SAException&)
            {
            }
            cout << "Error text: " << x.ErrText().GetMultiByteChars() << endl;
        }
        catch (...)
        {
            cout << "unknown exception" << endl;
        }
    }
}

int main(int argc, char* argv[])
{
    setlocale(LC_ALL, "");

    for (int j = 0; j < 1000; j++) {
        vector<std::thread> threads;
        for (int i = 0; i < 1000; i++) {
            threads.emplace_back(connect_task, j, i);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);
        }
        for (auto& t : threads) {
            t.join();
        }
    }
        return 0;
}
lurcher commented 1 year ago

On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:

Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection. It also uses the library I mentioned before. I've run something similar to this (tweaked number of thread pools / threads / iterations per thread / added mutexes) a hundred times and almost every single time it crashes due to this error within 15 minutes. I have not noticed any particular pattern regarding when it crashes, seems rather random, but it's always in the same code path. I have closed source software in which I discovered this where it generally crashes after around 5 minutes.

Ok, I can reproduce it, looks like its a race condition once the DM starts to expire connections from the pool. Will see what I can work out.

lurcher commented 1 year ago

On 16/05/2023 19:12, Nick Gorham wrote:

On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:

Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection. It also uses the library I mentioned before. I've run something similar to this (tweaked number of thread pools / threads / iterations per thread / added mutexes) a hundred times and almost every single time it crashes due to this error within 15 minutes. I have not noticed any particular pattern regarding when it crashes, seems rather random, but it's always in the same code path. I have closed source software in which I discovered this where it generally crashes after around 5 minutes.

Ok, I can reproduce it, looks like its a race condition once the DM starts to expire connections from the pool. Will see what I can work out.

OK, So I think I can see the problem. Its one where I think I need some opinions. The basic issue is SQLAPI issues multiple calls to SQLAllocEnv and then does pooing. Now as it stands the DM tries to match multiple env and pooling, but its basically failing. The failure starts once the pool entries start to timeout. Once they timeout the DM starts to disconnect them, but the disconnect is getting messed up with the various env handles. Now there is a lot of code in there to try and allow pooling and multiple envs and multiple drivers to be mixed and matched. Now Windows seems to avoid the problem by forcing a shared environment handle when pooling is used.

I can add shared handles, but the DM also tries very hard to avoid people reporting that there is still memory allocated and in use when the program exits, so its all back to counting usages.

There is a lot of code around pooling that looks a lot more complex than it probably needs to be. We have things like pooling timeout and usage counts that I think windows avoids. I think a lot of this can be simplified if we implement shared env handles when pooling is enabled.

Part of the reported problem is a combination of release_env being called when connections reach the pooling timeout value.

What do folk think about this and suggestions of directions of travel?

mcordova1967 commented 1 year ago

Hello, I helped locate a memory leak in UnixODBC about a year ago while using my multithreaded server with ODBC connection pooling. I am no longer using this multithreaded server w/connection pooling but I recall that I had no problem using ODBC API following Microsoft ODBC docs advice for connection pooling. I had a stress test those days, so I could check the memory leak was gone using Valgrind, and there was no error exiting the program after running thousands of requests and using the pool a lot.

You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads, in my case I used a global variable hEnv:

 rc = SQLSetEnvAttr( NULL, SQL_ATTR_CONNECTION_POOLING,

(SQLPOINTER)SQL_CP_ONE_PER_HENV , 0 ); if ( rc != SQL_SUCCESS ) std::cerr << "[ODBC] Failed to set connection pooling option" << "\n";

rc = SQLAllocHandle ( SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv ); if ( rc != SQL_SUCCESS ) { std::cerr << "[ODBC] Failed to allocate shared environment" << "\n"; throw std::runtime_error("Cannot initialize ODBC"); }

I don't know if this sheds some light on what may be happening with that SQLAPI library, but I did not have any problem using ODBC API and threads, with each thread managing it own connection using the shared hEnv, this hEnv was freed on program exit.

Regards, Martin

On Wed, May 17, 2023 at 5:41 AM Nick Gorham @.***> wrote:

On 16/05/2023 19:12, Nick Gorham wrote:

On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:

Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection. It also uses the library I mentioned before. I've run something similar to this (tweaked number of thread pools / threads / iterations per thread / added mutexes) a hundred times and almost every single time it crashes due to this error within 15 minutes. I have not noticed any particular pattern regarding when it crashes, seems rather random, but it's always in the same code path. I have closed source software in which I discovered this where it generally crashes after around 5 minutes.

Ok, I can reproduce it, looks like its a race condition once the DM starts to expire connections from the pool. Will see what I can work out.

OK, So I think I can see the problem. Its one where I think I need some opinions. The basic issue is SQLAPI issues multiple calls to SQLAllocEnv and then does pooing. Now as it stands the DM tries to match multiple env and pooling, but its basically failing. The failure starts once the pool entries start to timeout. Once they timeout the DM starts to disconnect them, but the disconnect is getting messed up with the various env handles. Now there is a lot of code in there to try and allow pooling and multiple envs and multiple drivers to be mixed and matched. Now Windows seems to avoid the problem by forcing a shared environment handle when pooling is used.

I can add shared handles, but the DM also tries very hard to avoid people reporting that there is still memory allocated and in use when the program exits, so its all back to counting usages.

There is a lot of code around pooling that looks a lot more complex than it probably needs to be. We have things like pooling timeout and usage counts that I think windows avoids. I think a lot of this can be simplified if we implement shared env handles when pooling is enabled.

Part of the reported problem is a combination of release_env being called when connections reach the pooling timeout value.

What do folk think about this and suggestions of directions of travel?

— Reply to this email directly, view it on GitHub https://github.com/lurcher/unixODBC/issues/141#issuecomment-1551077252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW34RADOPTPO2MT4KGAXQT3XGSMKZANCNFSM6AAAAAAYDUQAEU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lurcher commented 1 year ago

On 17/05/2023 12:50, Martín Córdova wrote:

Hello, I helped locate a memory leak in UnixODBC about a year ago while using my multithreaded server with ODBC connection pooling. I am no longer using this multithreaded server w/connection pooling but I recall that I had no problem using ODBC API following Microsoft ODBC docs advice for connection pooling. I had a stress test those days, so I could check the memory leak was gone using Valgrind, and there was no error exiting the program after running thousands of requests and using the pool a lot.

You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads, in my case I used a global variable hEnv:

Yep, in this case the lib is using multiple Env's

mcordova1967 commented 1 year ago

I remember that the only limitation, regarding ODBC docs, was that UnixODBC does not support SQL_CP_ONE_PER_DRIVER, the constant does not exist in UnixODBC headers.

Reference: https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlsetenvattr-function?view=sql-server-ver16

On Wed, May 17, 2023 at 8:00 AM Nick Gorham @.***> wrote:

On 17/05/2023 12:50, Martín Córdova wrote:

Hello, I helped locate a memory leak in UnixODBC about a year ago while using my multithreaded server with ODBC connection pooling. I am no longer using this multithreaded server w/connection pooling but I recall that I had no problem using ODBC API following Microsoft ODBC docs advice for connection pooling. I had a stress test those days, so I could check the memory leak was gone using Valgrind, and there was no error exiting the program after running thousands of requests and using the pool a lot.

You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads, in my case I used a global variable hEnv:

Yep, in this case the lib is using multiple Env's

— Reply to this email directly, view it on GitHub https://github.com/lurcher/unixODBC/issues/141#issuecomment-1551265245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW34RABQ3ZUT4ZNKJPMN5Q3XGS4WHANCNFSM6AAAAAAYDUQAEU . You are receiving this because you commented.Message ID: @.***>

lurcher commented 1 year ago

On 17/05/2023 13:14, Martín Córdova wrote:

I remember that the only limitation, regarding ODBC docs, was that UnixODBC does not support SQL_CP_ONE_PER_DRIVER, the constant does not exist in UnixODBC headers.

Reference: https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlsetenvattr-function?view=sql-server-ver16

Ok, so I have been looking at this and found and fixed a couple of problems in the DM that the test code shows. I have not put that back in git yet, but its on the ftp site now if the OP wants to try it.

However, I think there is also a race condition in SQLAPI that is causing fatal problems the DM is unlikely to be able to catch. I see the application via the lib calling SQLFreeHandle on a env handle, and then using the same (now free'd) handle to call SQLAllocHandle and SQLDriverConnect. Its possible that it gets away with this (more by luck than judgement) on Windows because of the shared env handle used with pooling. I have a snipped of a DM log that shows the problem happening:

[ODBC][1701236][1684403048.206901][SQLFreeHandle.c][220]         Entry:             Handle Type = 1             Input Handle = 0x7f7014003a50 [ODBC][1701236][1684403048.207035][SQLAllocHandle.c][395]         Entry:             Handle Type = 2             Input Handle = 0x7f7014003a50 [ODBC][1701236][1684403048.207059][SQLAllocHandle.c][511]         Exit:[SQL_SUCCESS]             Output Handle = 0x7f6ffc000d40 [ODBC][1701236][1684403048.207093][SQLDriverConnect.c][751]         Entry:             Connection = 0x7f6ffc000d40

I am assuming because of the threaded nature of the application, the code in SQLFreeHandle has not yet got to the point of releasing the handle so the handle validation code hasn't had a chance to mark the handle as invalid so its not caught on the following calls, but I think its a question to ask the SQLAPI folk. As its a closed source commercial lib, I guess its best that the OP ask's them the question in the first case.

Tagl commented 1 year ago

Thank you very much for the quick responses. I can raise the concerns to the maintainer of SQLAPI.

I'll try the updated version later today.

lurcher commented 1 year ago

On 18/05/2023 11:33, Arnar Bjarni Arnarson wrote:

Thank you very much for the quick responses. I can raise the concerns to the maintainer of SQLAPI.

I'll try the updated version later today.

— Reply to this email directly, view it on GitHub https://github.com/lurcher/unixODBC/issues/141#issuecomment-1552859354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYK62LYR5XK43PP3R53RF3XGX3I7ANCNFSM6AAAAAAYDUQAEU. You are receiving this because you commented.Message ID: @.***>

Thanks. The version on ftp now will catch this condition and report to stderr that its happened before returning SQL_INVALID_HANDLE

Tagl commented 1 year ago

That's great, I'll make sure to turn on tracing to try to spot it.

lurcher commented 1 year ago

On 18/05/2023 11:38, Arnar Bjarni Arnarson wrote:

That's great, I'll make sure to turn on tracing to try to spot it.

It will still catch and display the error without tracing. Though it will also happen with tracing on. Part of the problem is, being a race condition, adding tracing may alter the order this happen in and when it happens.

Tagl commented 1 year ago

I've tried it and now it no longer crashes but it seems as if connection pooling isn't really active. Before the fix, "connecting" to the database usually took 0ms since the connection was already alive, now with the same settings connecting seems to take 7-13ms which is the same speed I saw before enabling connection pooling.

I feel like the changes you described here should not have serious performance impacts so a new connection is being established each time. Any idea why that would happen after these changes?

I have the same config as in the original post and I have it both in /etc/odbcinst.ini and /usr/local/etc/odbcinst.ini

lurcher commented 1 year ago

On 19/05/2023 15:11, Arnar Bjarni Arnarson wrote:

I've tried it and now it no longer crashes but it seems as if connection pooling isn't really active. Before the fix, "connecting" to the database usually took 0ms since the connection was already alive, now with the same settings connecting seems to take 7-13ms which is the same speed I saw before enabling connection pooling.

I feel like the changes you described here should not have serious performance impacts so a new connection is being established each time. Any idea why that would happen after these changes?

I have the same config as in the original post and I have it both in |/etc/odbcinst.ini| and |/usr/local/etc/odbcinst.ini|

|What I see the test application (via the lib) doing is it creates an environment, then creates a couple of connections on that env, then disconnect them and return to the pool, it then releases the environment. What I have changed is that releasing the application environment releases the pooled connections. It then creates a new application env and a few connections, closes them and releases the env. There is some pooling going on but then releasing the environment clears that collection of connections out.|

|Now, it may be that its possible that pooled connections could be safely shared between application environments, but the change I made prevents this from happening at the moment as it was partly this that was causing the crash.|

|I suspect (but cant know as its closed source) that SQLAPI gets away by mixing and matching application environments on windows because the environment is shared. I did try adding shared environments with usage counting but the lib runs the usage count down to zero so the DM would release the env as it does now so there is no real advantage.|

Message ID: <lurcher/unixODBC/issues/141/15546485

lurcher commented 1 year ago

On 19/05/2023 15:11, Arnar Bjarni Arnarson wrote:

I've tried it and now it no longer crashes but it seems as if connection pooling isn't really active. Before the fix, "connecting" to the database usually took 0ms since the connection was already alive, now with the same settings connecting seems to take 7-13ms which is the same speed I saw before enabling connection pooling.

I feel like the changes you described here should not have serious performance impacts so a new connection is being established each time. Any idea why that would happen after these changes?

I have the same config as in the original post and I have it both in |/etc/odbcinst.ini| and |/usr/local/etc/odbcinst.ini|

— Reply to this email directly, view it on GitHub https://github.com/lurcher/unixODBC/issues/141#issuecomment-1554648538, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYK62PS5NKYE6VXS4BINETXG55S5ANCNFSM6AAAAAAYDUQAEU. You are receiving this because you commented.Message ID: @.***>

Try the one there now (on ftp) this implements a shared env handle when pooling is used. It does mean that the env and pooled connection will be there on program exit, but it cant release these if the shard env is in place as there is no real end.

Tagl commented 1 year ago

I will try it on Monday morning as I'm on holiday right now. I can avoid the usage count going down by maintaining an extra SAConnection instance as a means of "Keep Alive" for the pool. Worked with the sample code at least.

Tagl commented 1 year ago

Try the one there now (on ftp) this implements a shared env handle when pooling is used. It does mean that the env and pooled connection will be there on program exit, but it cant release these if the shard env is in place as there is no real end.

The new version you uploaded works as well. No extra connection object required on my end to keep the pool alive.

lurcher commented 1 year ago

On 22/05/2023 11:21, Arnar Bjarni Arnarson wrote:

Try the one there now (on ftp) this implements a shared env handle
when pooling is used. It does mean that the env and pooled
connection will be there on program exit, but it cant release
these if the shard env is in place as there is no real end.

The new version you uploaded works as well. No extra connection object required on my end to keep the pool alive.

— Reply to this email directly, view it on GitHub https://github.com/lurcher/unixODBC/issues/141#issuecomment-1556958830, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYK62PTDPNLNJ4H6C4TY5DXHM4ZBANCNFSM6AAAAAAYDUQAEU. You are receiving this because you commented.Message ID: @.***>

Ok, So the latest version creates a single env if pooing is in use, I think this matches what happens on Windows. But the only down side is that its then not possible to close things down before exit. Given the shared env stuff is enabled by defining SHARED_POOLED_ENV in drivermanager.h, and that you have a solution to the same problem, my instinct is always do the least possible to address problems, so I think I would leave it undefined and you can either make your own build, or use your extra connection object.

I think the main body of changes however do address the original problem that was causing the seg fault.

Just what change did you make to the test code to generate the extra connection? I would be interested in seeing if it fixes the use after release problem I saw in the SQLAPI lib. The shared env does hid this, but doesn't address the underlying issue. I am guessing the windows shared env has hidden the problem up to now to the lib maintainer.

Will leave it a bit and then push back into git unless anyone else has a view on this?

Tagl commented 1 year ago

Just what change did you make to the test code to generate the extra connection? I would be interested in seeing if it fixes the use after release problem I saw in the SQLAPI lib. The shared env does hid this, but doesn't address the underlying issue. I am guessing the windows shared env has hidden the problem up to now to the lib maintainer.

Just change the main function like so:


int main(int argc, char* argv[])
{
    setlocale(LC_ALL, "");
    SAConnection keepAlive;
    keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
    for (int j = 0; j < 1000; j++) {
        vector<std::thread> threads;
        for (int i = 0; i < 1000; i++) {
            threads.emplace_back(connect_task, j, i);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);
        }
        for (auto& t : threads) {
            t.join();
        }
    }
    keepAlive.Disconnect();
    return 0;
}
lurcher commented 1 year ago

On 22/05/2023 11:44, Arnar Bjarni Arnarson wrote:

 SAConnection keepAlive;
 keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}","someuser","somepass", SA_ODBC_Client);

Thanks.

lurcher commented 1 year ago

On 22/05/2023 12:20, Nick Gorham wrote:

On 22/05/2023 11:44, Arnar Bjarni Arnarson wrote:

 SAConnection keepAlive;
 keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL 

Server}","someuser","somepass", SA_ODBC_Client);

Thanks.

I have pushed the changes back that prevent the seg fault with SQLAPI and this set of operations

Tagl commented 1 year ago

With the keepalive connection changes in my company's software we tried running a stress test with v2.3.11 and v2.3.12pre. Both ran without crashing but I saw the memory leak mentioned before on 2.3.11 and the test was cut short due to the machine running out of memory. I also ran 2.3.12pre without keepalive changes and it did not crash. Both the leak and the crash seems to be completely resolved in 2.3.12pre. Thanks for the quick responses, issue is resolved.

dertin commented 1 year ago

I am interested in this fix. Is there an estimate when v2.3.12 will be released? Thanks!

lurcher commented 1 year ago

On 02/07/2023 11:00, Guillermo Céspedes Tabárez wrote:

I am interested in this fix. Is there an estimate when v2.3.12 will be released? Thanks!

Not at the moment, but the code is there to be used.