sijms / go-ora

Pure go oracle client
MIT License
789 stars 174 forks source link

How to determine reason from disconnect from trace.log #401

Closed dekke046 closed 1 year ago

dekke046 commented 1 year ago

Hi,

I am using the following batched approach to bulk load records in an Oracle ExaCC database cluster:

// Check if we have a full batch of records
        if len(columns[0]) >= batchSize {
            result, err := conn.BulkInsert(insertStatement, len(columns[0]), columns...)
            if err != nil {
                fmt.Println("Error inserting CSV records into database:", err)
                return 0, err
            }
            rowsAffected, _ := result.RowsAffected()
            totalInserted = totalInserted + int(rowsAffected)

            // Empty the columns slice of slices
            for i := range columns {
                columns[i] = []driver.Value{}
            }
        }

Why doing this with batches? I have two different databases on the same ExaCC Cluster, PRD1 and PRD2, In one of them I can use 250 as batch size and probably a higher number in the other I need to use 100, when using more the process of loading ends with EOF.

So below the two "cleaned" trace logs loading the same data in the two different databases.

Questions:

  1. What database settings could influence the loading behavior, so I can discuss with the database team?
  2. Could I extract more technical information about the root case of closing/terminating the connection from the trace.log, by looking in the encoded data.

GOOD:

2023-07-11T15:52:17.5179: Connect
2023-07-11T15:52:17.5305: using: ocm1.server.local:1521 ..... [SUCCEED]
2023-07-11T15:52:17.5306: Open :(DESCRIPTION=(ENABLE=BROKEN)(FAILOVER=on)(LOAD_BALANCE=off)(ADDRESS_LIST=(SOURCE_ROUTE=YES)(ADDRESS=(PROTOCOL=TCP)(HOST=ocm1.server.local)(PORT=1521))(AD
DRESS=(PROTOCOL=TCP)(HOST=exacc1.server.local)(PORT=1521)))(ADDRESS_LIST=(SOURCE_ROUTE=YES)(ADDRESS=(PROTOCOL=TCP)(HOST=ocm2.server.local)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOS
T=exacc1.server.local)(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=prd2.database)))
2023-07-11T15:52:17.5306:
Write packet:
2023-07-11T15:52:17.5306:
Write packet:
2023-07-11T15:52:17.5958:
Read packet:
2023-07-11T15:52:17.5958: Handshake Complete
2023-07-11T15:52:17.5959: Advance Negotiation
2023-07-11T15:52:17.5959:
Write packet:
2023-07-11T15:52:17.6661:
Read packet:
2023-07-11T15:52:17.6764: Diffie Hellman Keys:
2023-07-11T15:52:17.6764:
Generator:
2023-07-11T15:52:17.6764:
Prime:
2023-07-11T15:52:17.6765:
Private Key:
2023-07-11T15:52:17.6765:
Public Key:
2023-07-11T15:52:17.6765:
Server Public Key:
2023-07-11T15:52:17.6765:
Shared Key:
2023-07-11T15:52:17.6766: Send Client Public Key:
2023-07-11T15:52:17.6766:
Write packet:
2023-07-11T15:52:17.6767: TCP Negotiation
2023-07-11T15:52:17.6768:
Write packet:
2023-07-11T15:52:17.7555:
Read packet:
2023-07-11T15:52:17.7556: Server Charset: 873
2023-07-11T15:52:17.7556: Server National Charset: 2000
2023-07-11T15:52:17.7556: Data Type Negotiation
2023-07-11T15:52:17.7557:
Write packet:
2023-07-11T15:52:17.7613:
Read packet:
2023-07-11T15:52:17.7614: TTC Version: 11
2023-07-11T15:52:17.7614: doAuth
2023-07-11T15:52:17.7614:
Write packet:
2023-07-11T15:52:17.7687:
Read packet:
2023-07-11T15:52:17.7688: Summary: RetCode:0, Error Message:""
2023-07-11T15:52:17.7719:
Write packet:
2023-07-11T15:52:17.8041:
Read packet:
2023-07-11T15:52:17.8046:
Write packet:
2023-07-11T15:52:17.8096:
Read packet:
2023-07-11T15:52:17.8096: Connected
2023-07-11T15:52:17.8096: Database Version: 19.1.0.0.0
2023-07-11T15:52:17.8097: Exec:
truncate table KPNCHANNELS_SEGMENT_MAARTEN
2023-07-11T15:52:17.8098:
Write packet:
2023-07-11T15:52:18.0210:
Read packet:
2023-07-11T15:52:18.0211: Summary: RetCode:0, Error Message:""
2023-07-11T15:52:18.0211:
Write packet:
2023-07-11T15:52:18.0264:
Read packet:
2023-07-11T15:52:18.0269: BulkInsert:
INSERT INTO TABLE1(ZIPCODE, HOUSENUMBER, HOUSENUMBEREXTENSION, NAME, INSERTDATE, ENDDATE)
VALUES(:1, :2, :3, :4, :5, :6)
2023-07-11T15:52:18.0269: Row Num: 250
2023-07-11T15:52:18.0270: Column Num: 6
2023-07-11T15:52:18.0280:
Write packet:
2023-07-11T15:52:18.0283:
Write packet:
2023-07-11T15:52:18.0286:
Write packet:
2023-07-11T15:52:18.0416:
Read packet:
2023-07-11T15:52:18.0416: Summary: RetCode:0, Error Message:""
2023-07-11T15:52:18.0416:
Write packet:
2023-07-11T15:52:18.0471:
Read packet:

BAD:

2023-07-11T15:39:11.3033: Connect
2023-07-11T15:39:11.3245: using: ocm1.server.local:1521 ..... [SUCCEED]
2023-07-11T15:39:11.3245: Open :(DESCRIPTION=(ENABLE=BROKEN)(FAILOVER=on)(LOAD_BALANCE=off)(ADDRESS_LIST=(SOURCE_ROUTE=YES)(ADDRESS=(PROTOCOL=TCP)(HOST=ocm1.server.local)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=exacc1.server.local)(PORT=1521)))(ADDRESS_LIST=(SOURCE_ROUTE=YES)(ADDRESS=(PROTOCOL=TCP)(HOST=ocm2.server.local)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=exacc1.server.local)(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=prd1.database)))
2023-07-11T15:39:11.3245:
Write packet:
2023-07-11T15:39:11.3246:
Write packet:
2023-07-11T15:39:11.3823:
Read packet:
2023-07-11T15:39:11.3823: Handshake Complete
2023-07-11T15:39:11.3823: Advance Negotiation
2023-07-11T15:39:11.3823:
Write packet:
2023-07-11T15:39:11.4604:
Read packet:
2023-07-11T15:39:11.4699: Diffie Hellman Keys:
2023-07-11T15:39:11.4699:
Generator:
2023-07-11T15:39:11.4699:
Prime:
2023-07-11T15:39:11.4699:
Private Key:
2023-07-11T15:39:11.4700:
Public Key:
2023-07-11T15:39:11.4700:
Server Public Key:
2023-07-11T15:39:11.4700:
Shared Key:
2023-07-11T15:39:11.4701: Send Client Public Key:
2023-07-11T15:39:11.4701:
Write packet:
2023-07-11T15:39:11.4702: TCP Negotiation
2023-07-11T15:39:11.4702:
Write packet:
2023-07-11T15:39:11.5508:
Read packet:
2023-07-11T15:39:11.5509: Server Charset: 873
2023-07-11T15:39:11.5509: Server National Charset: 2000
2023-07-11T15:39:11.5510: Data Type Negotiation
2023-07-11T15:39:11.5510:
Write packet:
2023-07-11T15:39:11.5592:
Read packet:
2023-07-11T15:39:11.5597: TTC Version: 11
2023-07-11T15:39:11.5597: doAuth
2023-07-11T15:39:11.5598:
Write packet:
2023-07-11T15:39:11.5706:
Read packet:
2023-07-11T15:39:11.5706: Summary: RetCode:0, Error Message:""
2023-07-11T15:39:11.5742:
Write packet:
2023-07-11T15:39:11.6053:
Read packet:
2023-07-11T15:39:11.6055:
Write packet:
2023-07-11T15:39:11.6137:
Read packet:
2023-07-11T15:39:11.6138: Connected
2023-07-11T15:39:11.6138: Database Version: 19.1.0.0.0
2023-07-11T15:39:11.6139: Exec:
truncate table KPNCHANNELS_SEGMENT_MAARTEN
2023-07-11T15:39:11.6139:
Write packet:
2023-07-11T15:39:11.7706:
Read packet:
2023-07-11T15:39:11.7706: Summary: RetCode:0, Error Message:""
2023-07-11T15:39:11.7707:
Write packet:
2023-07-11T15:39:11.7779:
Read packet:
2023-07-11T15:39:11.8137: BulkInsert:
INSERT INTO TABLE1(ZIPCODE, HOUSENUMBER, HOUSENUMBEREXTENSION, NAME, INSERTDATE, ENDDATE)
VALUES(:1, :2, :3, :4, :5, :6)
2023-07-11T15:39:11.8137: Row Num: 250
2023-07-11T15:39:11.8137: Column Num: 6
2023-07-11T15:39:11.8148:
Write packet:
2023-07-11T15:39:11.8153:
Write packet:
2023-07-11T15:39:11.8156:
Write packet:
2023-07-11T15:39:12.0524: Close
2023-07-11T15:39:12.0525: Connection Closed
sijms commented 1 year ago

EOF is sent by oracle server when the client try to send data over closed socket socket closure occur due to network communication error for example

other oracle client like C# will throw network exception that means error in communication occur.

in normal situation failover is activated by database/sql package and reconnection occur

unfortunately fail over is not supported for go_ora.BulkInsert but you can use BulkInsert from inside Exec by passing arrays in the parameter as follow

conn, err := sql.Open("oracle", os.Getenv("DSN"))
// check for error

// first parameter array of ints
par1 := []int64{1, 2, 3}

// second parameter array of string 
par2 := []string{ "1", "2", "3"}

// note both have same size
_, err = conn.Exec("INSERT INTO TB1 (COL1, COL2) VALUES (:1, :2)", par1, par2)
// check for error

I am now re-write readme.md file to arrange data and information about using this package

dekke046 commented 1 year ago

Thanks :-)

Locally this works very well(oracle DB docker container on macBook), I can use huge numbers for the array size of par1/par2, 50000 works fine: _, err = conn.Exec("INSERT INTO TB1 (COL1, COL2) VALUES (:1, :2)", par1, par2)

When using this in the corporate environment, so go -> OCM -> Database it still does not work. Using array size of 87 records gives after 3132 inserted records an EOF error from the insert statement.

Using even l larger array's failes instantly with the following error: read tcp 10.31.210.132:38304->10.141.64.50:1521: read: connection reset by peer

Not sure how I can debug this.

Using Oracle SQL*Loader in bash on solaris server it works super fast, the CTL file starts with:

OPTIONS (SKIP = 1, DIRECT=TRUE, ROWS=30000, READSIZE=65000000, BINDSIZE=63000000)
LOAD DATA
....

Any thoughts?

Maarten

sijms commented 1 year ago

I think that SQL*Loader is using different technique for sending bulk data to server

dekke046 commented 1 year ago

I think that SQL*Loader is using different technique for sending bulk data to server

Yes you are right. What we currently see is that we have no issues loading large arrays on a 4 instance exaCC cluster without dataguard. A database with dataguard does not work when having array sizes higher than 80

Not sure in what way dataguard impacts this.

dekke046 commented 1 year ago

Issue is solved, database teams has changed some settings from DB end and all is working perfectly!

Many thanks for your help!

sijms commented 1 year ago

I will investigate sqlloader for fast loading data my first look I find that it uses direct path my direct path still under construction and still take long time (compare to sqlloader) to copy data