microsoft / go-mssqldb

Microsoft SQL server driver written in go language
BSD 3-Clause "New" or "Revised" License
262 stars 54 forks source link

Calling ping() will block #91

Open levin-go opened 1 year ago

levin-go commented 1 year ago

Describe the bug In arm v7 (arm32), calling the ping() function will cause blocking.Through log tracing, it is found that blocking occurs in Ping()->ExecContext()->s.exec(ctx, list)->s.processExec(ctx)->reader.iterateResponse()->t.nextToken()

Exception message:
Stack trace:

To Reproduce

func main() {
    var sqlDB *sql.DB
    dsn := fmt.Sprintf("sqlserver://%s:%s@%s:%d?database=%s&encrypt=disable", "sa", "xxxx", "192.168.100.245", 1433, "test")
    db, err := gorm.Open(sqlserver.Open(dsn), &gorm.Config{})
    if err != nil {
        fmt.Println(err)
    } else {
        sqlDB, _ = db.DB()
        // SetMaxIdleConns sets the maximum number of connections in the idle connection pool.
        sqlDB.SetMaxIdleConns(10)
        // SetMaxOpenConns sets the maximum number of open connections to the database.
        sqlDB.SetMaxOpenConns(20)
    }
    for {
        err = sqlDB.Ping()
        fmt.Println("ping status ===== ", err)
        if err != nil {
            sqlDB.Close()
            fmt.Println(err)
            return
        }
        time.Sleep(time.Second * 2)
    }
}

Expected behavior The database will be shut down normally without blocking. Appears only when the network cable between client and server is unplugged.

Further technical details

SQL Server version: SQL Server 2008 Operating system: arm v7 go version go1.18.2 windows/amd64 build command: GOOS=linux GOARCH=arm go build

levin-go commented 1 year ago
func (t tokenProcessor) nextToken() (tokenStruct, error) {
    // we do this separate non-blocking check on token channel to
    // prioritize it over cancellation channel
    select {
    case tok, more := <-t.tokChan:
        err, more := tok.(error)
        if more {
            // this is an error and not a token
            return nil, err
        } else {
            return tok, nil
        }
    default:
        // there are no tokens on the channel, will need to wait
    }

    select {
    case tok, more := <-t.tokChan:
        if more {
            err, ok := tok.(error)
            if ok {
                // this is an error and not a token
                return nil, err
            } else {
                return tok, nil
            }
        } else {
            // completed reading response
            return nil, nil
        }
    case <-t.ctx.Done():
        // It seems the Message function on t.outs.msgq doesn't get the Done if it comes here instead
        if t.outs.msgq != nil {
            _ = sqlexp.ReturnMessageEnqueue(t.ctx, t.outs.msgq, sqlexp.MsgNextResultSet{})
        }
        if t.noAttn {
            return nil, t.ctx.Err()
        }
        if err := sendAttention(t.sess.buf); err != nil {
            // unable to send attention, current connection is bad
            // notify caller and close channel
            return nil, err
        }

        // now the server should send cancellation confirmation
        // it is possible that we already received full response
        // just before we sent cancellation request
        // in this case current response would not contain confirmation
        // and we would need to read one more response

        // first lets finish reading current response and look
        // for confirmation in it

        if len(t.tokChan) > 0 && readCancelConfirmation(t.tokChan) {
            // we got confirmation in current response
            return nil, t.ctx.Err()
        }
        // we did not get cancellation confirmation in the current response
        // read one more response, it must be there
        t.tokChan = make(chan tokenStruct, 5)
        go processSingleResponse(t.ctx, t.sess, t.tokChan, t.outs)
        if  len(t.tokChan) > 0 && readCancelConfirmation(t.tokChan) {
            return nil, t.ctx.Err()
        }
        // we did not get cancellation confirmation, something is not
        // right, this connection is not usable anymore
        return nil, errors.New("did not get cancellation confirmation from the server")
    }
}

if len(t.tokChan) > 0 && readCancelConfirmation(t.tokChan) and if len(t.tokChan) > 0 && readCancelConfirmation(t.tokChan) I have add if len(t.tokChan) > 0 to prevent blocking.But I'm not sure if this will cause other problems.

mispon commented 1 year ago

Yep, the same problem @styfstr Have you thought about doing PR?

shueybubbles commented 1 year ago

if this issue is ARM-specific, isn't the root cause more likely to be somewhere lower in the networking stack in a core Go library? The current PR fails all the tests due to race condition detection.

mispon commented 1 year ago

Probably. But that logic in token.go looks a little bit strange if I understood it correctly. We waiting for the cancellation approval from a mssql server, but the server is already shut down (in my case). So, our ctx cancel does nothing and we just hanging forever on the loop by an empty channel

mispon commented 1 year ago

The same issue exists in the original repo since 2019 https://github.com/denisenkom/go-mssqldb/issues/527

tiwo commented 8 months ago

I'm trying to understand this issue ... wouldn't you expect Ping() to block in this situation? As long as the TCP connection is still alive after removing the network cable, that is.

For what it is worth, I can't reproduce this (on ARM64) when instead of pulling a physical network cable, I terminate a TCP tunnel (made with ssh between the ARM64 test environment and an SQL SERVER instance).

In this scenario, the TCP connection is properly terminated (but without any messaging on the TDS / SQL SERVER level. Ping() immediately notices loss of connection:

# ./issue91
ping status =====  <nil>
ping status =====  <nil>
ping status =====  <nil>
ping status =====  <nil>
ping status =====  Read: EOF
Read: EOF
# go version
go version go1.21.3 linux/arm64

If Ping() takes too long, use PingContext() with a timeout. Then you'll run into issue #160 (the context is ignored), but this is a different matter.

mispon commented 8 months ago

@tiwo hello! Thank you for the response In the end, I solved my problem by wrapping all mssql calls in a goroutine that can just be left if the call hangs on a ping or something else)