mattn / go-oci8

Oracle driver for Go using database/sql
https://mattn.kaoriya.net/
MIT License
630 stars 212 forks source link

Cursor leak in QueryContext when context is cancelled #379

Closed nvx closed 4 years ago

nvx commented 4 years ago

Running the following will eventually crash with ORA-01000: maximum open cursors exceeded

package main

import (
    "context"
    "database/sql"
    "flag"
    _ "github.com/mattn/go-oci8"
    "log"
    "strings"
    "sync"
    "time"
)

func main() {
    dsn := flag.String("dsn", "", "Oracle DSN")
    flag.Parse()

    db, err := sql.Open("oci8", *dsn)
    if err != nil {
        panic(err)
    }
    defer db.Close()

    db.SetMaxOpenConns(4)
    db.SetMaxIdleConns(4)
    db.SetConnMaxLifetime(30 * time.Minute)

    log.Print("Trying to crash")

    for i := 0; i < 10; i++ {
        go crasher(db)
    }

    log.Print("Waiting...")

    time.Sleep(60 * time.Second)

    log.Print("No crash?")
}

func crasher(db *sql.DB) {
    for {
        var wg sync.WaitGroup
        ctx, cancel := context.WithCancel(context.Background())
        wg.Add(1)
        go func() {
            defer wg.Done()
            rows, err := db.QueryContext(ctx,
                `select * from sys.audit_actions`,
            )
            if err != nil {
                if err == context.Canceled || strings.Contains(err.Error(), "ORA-01013") { // ORA-01013: user requested cancel of current operation
                if rows != nil {
                    log.Print("Non-nil rows returned during error? closing rows")
                    err  =rows.Close()
                    if err != nil {
                        panic(err)
                    }
                }
                    return
                }
                panic(err)
            }
            err = rows.Close()
            if err != nil {
                panic(err)
            }
        }()

        // Seems to crash faster if we wait a little bit
        time.Sleep(1 * time.Millisecond)

        cancel()
        wg.Wait()
    }
}

I was attempting to reproduce a nasty crash (malloc(): memory corruption (fast)) which seemed to be related to cancelling the context of a running query with go-oci8 when I ran into this issue. I'm not sure if this is also the cause of my memory corruption crash or not, but it's making it difficult to reproduce with this issue. Removing the cancel() call makes the issue stop occurring.

MichaelS11 commented 4 years ago

What version are you on?

nvx commented 4 years ago

Using go modules with github.com/mattn/go-oci8 v0.0.2 in the go.mod file. Go version 1.13.7.

On my dev machine using Oracle client 12.2, but have tried 19.5 and managed to get the same crash (although haven't tried the cursor leak on 19.5)

nvx commented 4 years ago

For completeness, I just tried against master (not that I think there's been any real changes since v0.0.2) which resolved to v0.0.3-0.20191221164601-d4ceab0db095 in the go.mod and had the same behaviour.

MichaelS11 commented 4 years ago

Tried the below code and have not been able to get any error besides the expected ones. Using Oracle Instant Client 12.2. Not seeing cursors left open on the database.

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "os"
    "sync"
    "time"

    "github.com/mattn/go-oci8"
)

func main() {
    oci8.OCI8Driver.Logger = log.New(os.Stderr, "oci8 ", log.Ldate|log.Ltime|log.LUTC|log.Llongfile)

    db, err := sql.Open("oci8", "username/password@databaseName")
    if err != nil {
        fmt.Printf("Open error is not nil: %v", err)
        return
    }

    defer func() {
        err = db.Close()
        if err != nil {
            fmt.Println("Close error is not nil:", err)
        }
    }()

    waitGroup := sync.WaitGroup{}
    for i := 0; i < 20; i++ {
        go func() {
            for j := 0; j < 50000; j++ {

                waitGroup.Add(1)
                go func() {
                    ctx, cancel := context.WithCancel(context.Background())
                    go func() {
                        rows, err := db.QueryContext(ctx, `select * from SYS.AUDIT_ACTIONS`)
                        if err != nil && err != context.Canceled &&
                            len(err.Error()) > 9 && err.Error()[0:9] != "ORA-01013" {
                            fmt.Println(err)
                        }
                        if rows != nil {
                            err = rows.Close()
                        }
                        time.Sleep(time.Second)
                        waitGroup.Done()
                    }()
                    cancel()
                }()
                time.Sleep(time.Microsecond)

            }
        }()
        time.Sleep(time.Second)
    }
    time.Sleep(time.Second)
    waitGroup.Wait()
}
nvx commented 4 years ago

Indeed I can't seem to reproduce it with that code either, however slight tweaks to your code (calling db.SetMaxOpenConns(4), removing one of the goroutines and adjusting where a sleep is) makes the issue reappear.

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "os"
    "sync"
    "time"

    "github.com/mattn/go-oci8"
)

func main() {
    oci8.OCI8Driver.Logger = log.New(os.Stderr, "oci8 ", log.Ldate|log.Ltime|log.LUTC|log.Llongfile)

    db, err := sql.Open("oci8", "username/password@databaseName")
    if err != nil {
        fmt.Printf("Open error is not nil: %v", err)
        return
    }

    defer func() {
        err = db.Close()
        if err != nil {
            fmt.Println("Close error is not nil:", err)
        }
    }()

    db.SetMaxOpenConns(4)

    waitGroup := sync.WaitGroup{}
    for i := 0; i < 20; i++ {
        go func() {
            for j := 0; j < 50000; j++ {

                waitGroup.Add(1)
                //go func() {
                ctx, cancel := context.WithCancel(context.Background())
                go func() {
                    rows, err := db.QueryContext(ctx, `select * from SYS.AUDIT_ACTIONS`)
                    if err != nil && err != context.Canceled &&
                        len(err.Error()) > 9 && err.Error()[0:9] != "ORA-01013" {
                        fmt.Println(err)
                    }
                    if rows != nil {
                        err = rows.Close()
                    }
                    time.Sleep(time.Second)
                    waitGroup.Done()
                }()
                time.Sleep(time.Millisecond)
                cancel()
                //}()
            }
        }()
        time.Sleep(time.Second)
    }
    time.Sleep(time.Second)
    waitGroup.Wait()
}

I've also managed to reproduce the crash, turns out what I was missing is using a bind variable in the query. Not using it just nets me the maximum open cursors exceeded, while using it results in a memory corruption error.

Results in a hard crash:

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "os"
    "sync"
    "time"

    "github.com/mattn/go-oci8"
)

func main() {
    oci8.OCI8Driver.Logger = log.New(os.Stderr, "oci8 ", log.Ldate|log.Ltime|log.LUTC|log.Llongfile)

    db, err := sql.Open("oci8", "username/password@databaseName")
    if err != nil {
        fmt.Printf("Open error is not nil: %v", err)
        return
    }

    defer func() {
        err = db.Close()
        if err != nil {
            fmt.Println("Close error is not nil:", err)
        }
    }()

    db.SetMaxOpenConns(4)

    waitGroup := sync.WaitGroup{}
    for i := 0; i < 20; i++ {
        go func() {
            for j := 0; j < 50000; j++ {

                waitGroup.Add(1)
                //go func() {
                ctx, cancel := context.WithCancel(context.Background())
                go func() {
                    rows, err := db.QueryContext(ctx, `select * from SYS.AUDIT_ACTIONS where name=:1`, "UNKNOWN")
                    if err != nil && err != context.Canceled &&
                        len(err.Error()) > 9 && err.Error()[0:9] != "ORA-01013" {
                        fmt.Println(err)
                    }
                    if rows != nil {
                        err = rows.Close()
                    }
                    time.Sleep(time.Second)
                    waitGroup.Done()
                }()
                time.Sleep(time.Millisecond)
                cancel()
                //}()
            }
        }()
        time.Sleep(time.Second)
    }
    time.Sleep(time.Second)
    waitGroup.Wait()
}
nvx commented 4 years ago

Having a bit of a look at statement.go, I noticed a defer freeBinds(binds) call at the start of the function here: https://github.com/mattn/go-oci8/blob/20a6e68334b020dbc4d7adca75a4bec73bd7f09f/statement.go#L376

But freeBinds(binds) is being called again later here too https://github.com/mattn/go-oci8/blob/20a6e68334b020dbc4d7adca75a4bec73bd7f09f/statement.go#L433

There might be other issues too, but this stands out to me as a double free which could be causing some issues, and would only be triggered during a cancelled context. Doesn't explain the cursor leak though, but would explain the crash (perhaps the cursor leak is being hidden because it crashes before that can occur).

Looking at the rest of the code that follows, is like 433 meant to be freeDefines(defines) instead of freeBinds(binds) perhaps?

nvx commented 4 years ago

As a test I changed line 433 on statement.go to freeDefines(defines) and it no longer crashes with my crash example, but does still result in ORA-01000: maximum open cursors exceeded after a few loops.

MichaelS11 commented 4 years ago

You are correct, line 433 should be freeDefines instead of freeBinds.

MichaelS11 commented 4 years ago

I wonder if the cursors are just getting overloaded on your box, and it is not an issue with cursor leak. What is your max cursors set to?

nvx commented 4 years ago

If you move the cancel() call on the example to be inside the innermost go routine after the waitGroup.Done() rather than outside the maximum open cursors exceeded error no longer occurs. I believe the max cursors is set to the default of 300 and the database I'm testing against is pretty much idle.

MichaelS11 commented 4 years ago

So I think the only issue is the crash, correct?

https://github.com/mattn/go-oci8/pull/380

nvx commented 4 years ago

With PR #380 merged that fixes the crash, but I can still run out of cursors with my example code - even the first one I posted which by my reckoning shouldn't have more than 10 cursors opened at any given point (only a fixed number of goroutines are made - 10, and there's more than 10 cursors available in the DB) - while not cancelling the context results in not running out of cursors. So I think there's two issues.

MichaelS11 commented 4 years ago

Alright, will look at it more tomorrow.

nvx commented 4 years ago

I am by no means an expert, so please validate that I'm not completely missing the mark here, but from playing around a little it looks like the issue occurs when the context is cancelled during ociStmtExecute here https://github.com/mattn/go-oci8/blob/20a6e68334b020dbc4d7adca75a4bec73bd7f09f/statement.go#L414-L415

With the ociBreak goroutine causing the cancel to occur. Commenting out the ociBreak goroutine causes the issue to disappear (although obviously preventing cancellation at this point).

Adding a check after ociStmtExecute returns to check if the context is cancelled, and if so calling OCIReset seems to fix the problem while still allowing cancellation. In practice this probably needs to be done if and only if OCIBreak is called first, simply adding the call to the end of the ociBreak goroutine is incorrect as the ociBreak goroutine is used elsewhere when OCIReset should not be called afterwards. Perhaps a flag on the ociBreak goroutine to indicate if OCIReset needs to be called and setting that flag to true only during the ociStmtExecute but not other instances would work.

I have very little experience working with the OCI library (the docs aren't the best) and have only done very limited testing with what I've described, but it might be a good starting point at least.

It's probably also needed in exec as well here, although I've only tested with query: https://github.com/mattn/go-oci8/blob/20a6e68334b020dbc4d7adca75a4bec73bd7f09f/statement.go#L696-L697

Thinking about it I'm not entirely sure if it's safe to call OCIReset from within the ociBreak goroutine or if it needs to be called only after ociStmtExecute returns in that goroutine instead. I'm also not sure if it's safe to call OCIReset if OCIBreak hasn't been called on the connection or not (thinking race condition if the context is cancelled after ociStmtExecute and ociBreak returns but before the context is checked to see if it's done to see if OCIReset needs calling) - if not maybe some singalling is required so that the ociBreak goroutine can indicate if OCIBreak was called or not.

MichaelS11 commented 4 years ago

Only get maximum open cursors exceeded to happen when SetMaxOpenConns is set. This probably means something interesting but not sure what yet.

Adding OCIReset right after OCIBreak causes a crash most of the time.

Added a reset flag and OCIReset to statement close, still was getting maximum open cursors exceeded. So not sure if OCIReset is really needed or even helps.

Still does not seem like there is a cursor leak. Seems more like there is some kind of limit being hit when SetMaxOpenConns is used. This limit is keeping the cursors from being cleaned up fast enough. Maybe a connection is needed to do the cleanup and it is not getting a connection because of a limit? Not sure the details of that, just floating around the idea.

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "os"
    "sync"
    "sync/atomic"
    "time"

    "github.com/mattn/go-oci8"
)

func main() {
    oci8.OCI8Driver.Logger = log.New(os.Stderr, "oci8 ", log.Ldate|log.Ltime|log.LUTC|log.Llongfile)

    db, err := sql.Open("oci8", "username/password@databaseName")
    if err != nil {
        fmt.Printf("Open error is not nil: %v", err)
        return
    }

    defer func() {
        err = db.Close()
        if err != nil {
            fmt.Println("Close error is not nil:", err)
        }
    }()

    db.SetMaxOpenConns(10)

    var numOfCancel uint64
    var numOfCursorsExceeded uint64
    for i := 0; i < 20; i++ {
        go func(i int) {
            for j := 0; j < 20000; j++ {

                waitGroup := sync.WaitGroup{}
                waitGroup.Add(1)
                ctx, cancel := context.WithCancel(context.Background())

                go func() {
                    defer waitGroup.Done()
                    rows, err := db.QueryContext(ctx, `select * from SYS.WRI$_OPTSTAT_HISTGRM_HISTORY`)
                    if err != nil && err != context.Canceled && len(err.Error()) > 9 {
                        switch err.Error()[0:9] {
                        case "ORA-01013": // ORA-01013: user requested cancel of current operation
                            atomic.AddUint64(&numOfCancel, 1)
                        case "ORA-01000": // ORA-01000: maximum open cursors exceeded
                            atomic.AddUint64(&numOfCursorsExceeded, 1)
                        case "ORA-12519": // ORA-12519: TNS:no appropriate service handler found
                        default:
                            fmt.Println(err)
                        }
                    }
                    if rows != nil {
                        err = rows.Close()
                        if err != nil {
                            fmt.Println(err)
                        }
                    }
                }()

                time.Sleep(time.Millisecond)
                cancel()
                waitGroup.Wait()

            }
            fmt.Println("done", i)
        }(i)
    }

    for i := 0; i < 30; i++ {
        time.Sleep(2 * time.Second)
        fmt.Println(atomic.LoadUint64(&numOfCancel), "-", atomic.LoadUint64(&numOfCursorsExceeded))
    }
}
MichaelS11 commented 4 years ago

Lowered the number of connections to half the max open connections and did not get any maximum open cursors exceeded (without calling OCIReset). So really hard to say if there is a problem or if there is one, where the problem is located.

nvx commented 4 years ago

The more connections the longer it takes, but with eg 5 goroutines and 10 max open conns I do still eventually run into the maximum open cursors exceeded error. It makes sense that with more connections it would be harder to hit the limit as the limit is per connection.

I found if I put the OCIReset call in the ociBreak goroutine I got crashes, but I didn't get any crashes if I put it after the ociStmtExecute call like so:

    done := make(chan struct{})
    go stmt.conn.ociBreak(ctx, done)
    err = stmt.ociStmtExecute(iter, mode)
    close(done)
    select {
    case <-ctx.Done():
        result := C.OCIReset(
            unsafe.Pointer(stmt.conn.svc), // The service context handle or the server context handle.
            stmt.conn.errHandle,           // An error handle
        )
        err = stmt.conn.getError(result)
        if err != nil {
            stmt.conn.logger.Print("OCIReset error: ", err)
        }
        return nil, ctx.Err()
    default:
    }

Is that where you had it and got crashes, or?

MichaelS11 commented 4 years ago

I put OCIReset inside statement close, it did not seem to help nor do any harm.

You are right, raising the max open connection did not eliminate the maximum open cursors exceeded but it reduced it greatly. Doing over 45,000 cancels with less than 40 maximum open cursors exceeded during a minute time.

You sure this is not just a performance thing rather then a leak?

nvx commented 4 years ago

If it's a performance thing, why does not not occur when the queries are not cancelled though? (eg, swap the order of the waitGroup.Wait() and cancel() calls).

I've been working with my DBA who thinks it is indeed leaking. My guess is it's a bit racey so network latency might be making the issue more apparent for me than you.

MichaelS11 commented 4 years ago

It may be a performance thing because there is most likely more work that needs to happen when a cancel happens. More communication between the client and the service, hence why network makes a difference.

What query is your DBA using to figure out it is leaking? Some of the tables can show confusing data.

nvx commented 4 years ago

I've had a bit more of a play, adding a delay in the loop looks like it stops running out of cursors. I'm wondering if now the call to OCIReset was just adding enough of a delay. It would be ideal if the query didn't return until after any additional background work was completed, but I can imagine that might be difficult to achieve.

Waiting on my DBA to get back to me after testing with a delay to see what he can see, but I'm starting to think you might be right that it's not a true leak, just a delay before the resources are fully freed.

nvx commented 4 years ago

I've just had a chat with my DBA, looks like it was cancelling the context before it opened the cursor most times, hence giving the appearance of the issue going away. Adding a 100ns sleep after starting the Query goroutine and calling cancel() made the issue reappear, even with a sleep before each loop (the longer the sleep the longer it takes to exhaust the cursors of course, but it still goes up and is never freed).

This is the query my DBA is using to monitor the cursors:

SELECT a.sid, a.value, b.name
FROM v$sesstat a, v$statname b
WHERE a.statistic# = b.statistic# AND b.name='opened cursors current'
and a.sid=205
ORDER BY a.value desc;

Where 205 is the sid for the connection in my test app (for simplicity I've got MaxOpenConns limited to 1 at the moment, and 5 goroutines running queries)

nvx commented 4 years ago

Commenting out line 414 (the ociBreak goroutine before ociStmtExecute is called) the number of cursors stays at 3 and does not increase no matter how long the application runs for.

The OCIReset was definitely a red herring though and doesn't appear to help, but it does appear to be triggered when OCIBreak is called during OCIStmtExecute that the cursor leak occurs.

MichaelS11 commented 4 years ago

Still think it is a performance issue. When set the max number of connections very low the Go program can send cancel faster and the database / network can not keep up. When I set max connections to 5 and number of threads to 10, see maximum open cursors exceeded. But when I set max connections to 10 and number of threads to 20 see very few maximum open cursors exceeded.

I get that the DBA and you are seeing maximum open cursors exceeded but that does not mean there is a cursors leak. I think it just means the cursors are not getting cleaned up fast enough for new cursors to come in.

MichaelS11 commented 4 years ago

Try adding a back off sleep to the test code, so it slowly takes longer to run the next select and then cancel. As the load/amount of queries goes down the cursors should come back, which would show that is a performance issue rather then a cursor leak.

nvx commented 4 years ago

That's what I meant though, making it only do one select a second (with a 1 second sleep) it still showed the number of cursors increasing over time on the database, but it obviously leaked much slower at that point.

nvx commented 4 years ago

In fact, I can reproduce it with a much simpler test (now I'm no longer trying to reproduce the now-fixed crash) even and without calling SetMaxOpenConns:

package main

import (
    "context"
    "database/sql"
    "flag"
    _ "github.com/mattn/go-oci8"
    "log"
    "strings"
    "sync"
    "time"
)

func main() {
    dsn := flag.String("dsn", "", "Oracle DSN")
    flag.Parse()

    db, err := sql.Open("oci8", *dsn)
    if err != nil {
        panic(err)
    }
    defer db.Close()

    for {
        var wg sync.WaitGroup
        ctx, cancel := context.WithCancel(context.Background())
        wg.Add(1)
        go func() {
            defer wg.Done()

            log.Print("loop")

            rows, err := db.QueryContext(ctx, `select 1 from dual`)
            if rows != nil {
                ierr := rows.Close()
                if ierr != nil {
                    panic(ierr)
                }
            }

            if err != nil {
                if err == context.Canceled || strings.Contains(err.Error(), "ORA-01013") { // ORA-01013: user requested cancel of current operation
                    return
                }
                panic(err)
            }
        }()

        time.Sleep(100 * time.Nanosecond)

        cancel()
        wg.Wait()

        //time.Sleep(10 * time.Millisecond)
    }
}

Will eventually fail with maximum open cursors exceeded.

Uncommenting the sleep and playing with the sleep time makes it slower, but still cursor leaks are visible. The first sleep for 100ns I found if it's too short the context is cancelled before ociExecStatement is called and the issue doesn't occur, and if it's too long it's cancelled too late and the issue also doesn't occur. 100ns seems to reproduce it reliably for me, but the exact delay might need tweaking in your environment.

nvx commented 4 years ago

What might be a bit more reliable to reproduce it is to pass a cancelled context to ociBreak before ociStmtExecute is called rather than relying on timing delays. This seemed to work for me (combined with not cancelling the context in the test app):

    done := make(chan struct{})
    ctx, cancel := context.WithCancel(ctx)
    go stmt.conn.ociBreak(ctx, done)
    cancel()
    err = stmt.ociStmtExecute(iter, mode)
    close(done)
    if err != nil {
        return nil, err
    }

Around like 414 in statement.go

MichaelS11 commented 4 years ago

So confirmed there is definitely a cursor leak. It only happens when OCIBreak is called while OCIStmtExecute is being called.

For now, reduced the chance of OCIBreak being called while OCIStmtExecute is being called: https://github.com/mattn/go-oci8/pull/381

As for a fix, not sure how to fix it. There is either an OCI call that is needed or an Oracle bug. @mattn @cjbj Any ideas?

Test code:

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "math/rand"
    "os"
    "sync"
    "sync/atomic"
    "time"

    "github.com/mattn/go-oci8"
)

func main() {
    oci8.OCI8Driver.Logger = log.New(os.Stderr, "oci8 ", log.Ldate|log.Ltime|log.LUTC|log.Llongfile)

    db, err := sql.Open("oci8", "username/password@databaseName")
    if err != nil {
        fmt.Printf("Open error is not nil: %v", err)
        return
    }

    defer func() {
        err = db.Close()
        if err != nil {
            fmt.Println("Close error is not nil:", err)
        }
    }()

    var numOfCancel uint64
    var numOfCursorsExceeded uint64

    go func() {
        for j := 0; j < 5000; j++ {

            waitGroup := sync.WaitGroup{}
            waitGroup.Add(1)
            ctx, cancel := context.WithTimeout(context.Background(), time.Duration(rand.Int63n(50))*time.Microsecond)
            text := "b"

            go func() {
                defer waitGroup.Done()
                _, err := db.ExecContext(ctx, `begin :string1 := 'a'; end;`, text)
                if err != nil && err != context.DeadlineExceeded && len(err.Error()) > 9 {
                    switch err.Error()[0:9] {
                    case "ORA-01013": // ORA-01013: user requested cancel of current operation
                        atomic.AddUint64(&numOfCancel, 1)
                    case "ORA-00604": // ORA-00604: error occurred at recursive SQL level 1
                        atomic.AddUint64(&numOfCancel, 1)
                    case "ORA-01000": // ORA-01000: maximum open cursors exceeded
                        atomic.AddUint64(&numOfCursorsExceeded, 1)
                    case "ORA-12519": // ORA-12519: TNS:no appropriate service handler found
                    default:
                        fmt.Println(err)
                    }
                }
            }()

            waitGroup.Wait()
            cancel()

            if atomic.LoadUint64(&numOfCursorsExceeded) > 1 {
                time.Sleep(30 * time.Second)
            }
        }
    }()

    for i := 0; i < 200; i++ {
        time.Sleep(3 * time.Second)
        fmt.Println(atomic.LoadUint64(&numOfCancel), "-", atomic.LoadUint64(&numOfCursorsExceeded))
    }
}
MichaelS11 commented 4 years ago

@cjbj Did Bug 1787503 come back? http://www.umsl.edu/~eckerta/README.html#1787503

cjbj commented 4 years ago

@MichaelS11 not that I know of; I'll ask around.

MichaelS11 commented 4 years ago

@cjbj If not that exact bug would be one very simular.

cjbj commented 4 years ago

@MichaelS11 do you know exact Oracle client & DB versions that it has been seen with?

MichaelS11 commented 4 years ago

@cjbj Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production Oracle Instant Client 12.2.0.1.0

@nvx What was the exact version?

nvx commented 4 years ago

I'm running:

Oracle Database 12c 12.1.0.2.0 Oracle Instant Client 12.2.0.1.0

cjbj commented 4 years ago

There was one timing-dependent break/reset leak fixed in 18.3 client libraries. Can you try these? They will still connect to your 12.1 DB.

MichaelS11 commented 4 years ago

Tired with Oracle Instant Client 18.5.0.0.0 (Windows 64 bit) which was on the main web site page. Had same issue.

MichaelS11 commented 4 years ago

@nvx Have you tried with any other Oracle client versions?

nvx commented 4 years ago

I haven't, at least not on my dev machine for testing cursor leaks

cjbj commented 4 years ago

@MichaelS11 if you have a standalone OCI testcase, I can ask someone to look at it. Sorry we can't work with the go-oci8 code base directly.

MichaelS11 commented 4 years ago

@cjbj The below C++ code running on Linux seems to reproduce the issue.

Note that the line usleep(12000); will probably need to be adjusted depending on the speed of the database and etc. It needs to be adjusted so that the OCIStmtExecute is running when OCIBreak is called.

<edit, removed old code>

cjbj commented 4 years ago

For completeness, can you share your makefile?

MichaelS11 commented 4 years ago

Just a one liner: gcc -o oracle_break -I /usr/include/oracle/12.2/client64/ oracle_break.cpp -L /usr/lib/oracle/12.2/client64/lib/ -l stdc++ -l pthread -l clntsh

cjbj commented 4 years ago

I'll play some more, but initial attempts didn't reproduce a problem. I saw a lot of ORA-01013s and eventually the program completed:

. . . 
loop start
OCIHandleFree statementHp start
1582006683482
OCIBreak start
OCIBreak end
OCIHandleFree statementHp end
OCIStmtPrepare start
OCIStmtPrepare end
1582006683483
OCIStmtExecute start
OCIStmtExecute end
OCIStmtFetch2 start
OCIStmtFetch2 end
OCIHandleFree statementHp start
OCIHandleFree statementHp end
loop end
loop start
OCIStmtPrepare start
OCIStmtPrepare end
1582006686487
OCIStmtExecute start
OCIStmtExecute end
OCIStmtFetch2 start
1582006686488
OCIBreak start
OCIBreak end
OCIStmtFetch2() Error: ORA-01013: user requested cancel of current operation

loop end
loop start
OCIHandleFree statementHp start
1582006689489
OCIBreak start
OCIBreak end
OCIHandleFree statementHp end
OCIStmtPrepare start
OCIStmtPrepare end
1582006689489
OCIStmtExecute start
OCIStmtExecute end
OCIStmtFetch2 start
OCIStmtFetch2 end
OCIHandleFree statementHp start
OCIHandleFree statementHp end
loop end
loop start
1582006692494
OCIBreak start
OCIBreak end
OCIStmtPrepare start
OCIStmtPrepare end
1582006692494
OCIStmtExecute start
OCIStmtExecute end
OCIStmtFetch2 start
OCIStmtFetch2 end
OCIHandleFree statementHp start
OCIHandleFree statementHp end
loop end
cleanup start
cleanup end
cjbj commented 4 years ago

I did use a simpler query and reduced the sleep time.

MichaelS11 commented 4 years ago

The ORA-01013 are wanted. That is what error happens when the OCIBreak is working correctly. Need to make sure the OCIBreak is happening on the OCIStmtExecute. So should see the output line OCIStmtExecute start but not see the output line OCIStmtExecute end. Getting the line OCIStmtFetch2() Error: ORA-01013 mean the OCIBreak is happening too late.

Should not be a need to change the query used.

MichaelS11 commented 4 years ago

Note that the idea of the program is, while it is running check the target database for open cursors. The number of open cursors should go up as OCIBreak is called while OCIStmtExecute is running.

cjbj commented 4 years ago

@MichaelS11 I haven't had a chance to revisit this.

I had changed the query for my initial, quick test so I didn't have to fiddle with grants & permissions.

MichaelS11 commented 4 years ago

@cjbj Please look at it when you can

cjbj commented 4 years ago

@MichaelS11 thanks for the reminder.