sijms / go-ora

Pure go oracle client
MIT License
796 stars 177 forks source link

Query panic #480

Closed zhanghaiyang9999 closed 9 months ago

zhanghaiyang9999 commented 10 months ago

I have a Table named TABLE_ALL with 24 fields, it included some of data types such as LONG, BLOB, VARCHAR, BFILE, etc. it had about 100 records. If I executed the query "select from TABLE_ALL where id < 10", it worked well. But executed the query "select from TABLE_ALL", it paniced. (by the way, in Oracle SQL Developer ,worked well) go-ora version is 2.7.25 Oracle version is 19c. The zip file TABLE_ALL.zip is the data that table TABLE_ALL exported with loader format.

TABLE_ALL.zip

the stack is: image image

zhanghaiyang9999 commented 10 months ago

when executing the query "select * from TABLE_ALL", there are two situations, one is hang, another is panic. if it did not panic, then it will hang image

sijms commented 10 months ago

the code pass with me without error in both case

zhanghaiyang9999 commented 10 months ago

the code pass with me without error in both case You can use the following url to connect my Oracle to test "oracle://C%23%23test:easync2019@123.113.154.79:1521/?LOB FETCH=POST&TIMEOUT=180&SID=orcl" (the user name is C##test)

You can try this query "select * from TABLE_ALL"

or try this: "select * from TABLE_ALL where id=10" this will be ok.

sijms commented 10 months ago

ok i get the error i will investigate and update the code and tell you

zhanghaiyang9999 commented 10 months ago

the code pass with me without error in both case You can use the following url to connect my Oracle to test "oracle://C%23%23test:easync2019@123.113.154.79:1521/?LOB FETCH=POST&TIMEOUT=180&SID=orcl" (the user name is C##test)


Maybe it related to the BLOB type, in my test, the following query can lead to hang or panic randomly (COLUMN_6 is BLOB type) select COLUMN_6 from TABLE_ALL

sijms commented 10 months ago

the result of testing is either: 1- go normal without errors 2- timeout occur --> need code correction to return "user request cancel of current operation" 3- panic because if read 0xff (part of BLOB data) as a buffer size this means data shifted right or left 4- receiving abnormal packet: as shown below 5 zero bytes is inserted between 1f and af !!!!!

number 4 result

00000000  00 00 1f 00 00 00 00 00  af 06 00 00 00 00 00 ff  |................|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

the correct situation

00000000  00 00 1f af 06 00 00 00  00 00 ff ff ff ff ff ff  |................|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

in case of 3 and 4 something abnormal happen in the network read so we get abnormal data from the socket this will not happen on local networks

sijms commented 10 months ago

I make some correction inside readPacket read header in loop so avoid read partial header

index = 0
length = 8
for index < length {
    var temp int
    err = session.initRead()
    if err != nil {
        return nil, err
    }
    if session.sslConn != nil {
        temp, err = session.sslConn.Read(head)
    } else {
        temp, err = session.conn.Read(head)
    }
    if err != nil {
        if e, ok := err.(net.Error); ok && e.Timeout() && temp != 0 {
            index += uint32(temp)
            continue
        }
        return nil, err
    }
    index += uint32(temp)
}

and now the result

00000000  00 00 00 ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
sijms commented 10 months ago

from initial investigation I can suspect the cause of all of these issues is network data loss (confirmed by using Wireshark) oracle send data in continuous manner and I read data in packet (for simplicity). net.Conn has small buffer which will be full with large blob and slow networking. so the solution for this issue is to read data in continuous manner and put it into local buffer (same as original drivers) I will try to build and test this model. also if you have suggestion or help I will be appreciate it

zhanghaiyang9999 commented 10 months ago

from initial investigation I can suspect the cause of all of these issues is network data loss (confirmed by using Wireshark) oracle send data in continuous manner and I read data in packet (for simplicity). net.Conn has small buffer which will be full with large blob and slow networking. so the solution for this issue is to read data in continuous manner and put it into local buffer (same as original drivers) I will try to build and test this model. also if you have suggestion or help I will be appreciate it

Thanks you very much! if you make a new build, I can test in local environment, by the way, godror can work well in this case. maybe can refer it?

sijms commented 9 months ago

fixed v2.8.0

zhanghaiyang9999 commented 9 months ago

fixed v2.8.0

v2.8.0 will not panic or hang, but after executed conn.Query, the rows.Next has no records. you can use my environment to test.

sijms commented 9 months ago

fixed v2.8.1