sijms / go-ora

Pure go oracle client
MIT License
802 stars 177 forks source link

Querying small LOB fields is slower #259

Closed travelliu closed 1 year ago

travelliu commented 2 years ago

LOB field data content is small.

The query performance is lower than that of the godror driver, and the performance varies greatly

CREATE TABLE BS_F_AQ_SJ_RW_FML (
    ID VARCHAR2(50),
    BASEID VARCHAR2(255),
    BASESCHEMA VARCHAR2(255),
    PHASENO VARCHAR2(255),
    FILEDID VARCHAR2(255),
    FILEDTYPE VARCHAR2(255),
    FILEDVALUE CLOB,
    MODIFYDATE NUMBER(15,0),
    DEALERID VARCHAR2(255),
    DEALER VARCHAR2(255),
    TASKID VARCHAR2(255),
    FIELDCODE VARCHAR2(255),
    ACTIONTYPE VARCHAR2(255),
    ACTIONNAME VARCHAR2(255),
    FIELDLABEL VARCHAR2(255),
    ORDERNUM NUMBER(15,0),
    COLSPAN NUMBER(15,0)
);
driver select rows time
go-ora 20000 7398ms
godror 20000 243ms

This code doesn't seem to take into account the inline storage scenario

Go Version

go version
go version go1.19.2 linux/arm64
        github.com/godror/godror v0.34.0
        github.com/jessevdk/go-flags v1.5.0
        github.com/sijms/go-ora/v2 v2.5.3

go-ora

go run main.go --host=172.16.0.198 --port=55446 --username=xxxxxx --password=xxxxxx --sql='SELECT /*+ PARALLEL(BS_F_AQ_SJ_RW_FML,2) */ "ID","BASEID","BASESCHEMA","PHASENO","FILEDID","FILEDTYPE","FILEDVALUE","MODIFYDATE","DEALERID","DEALER","TASKID","FIELDCODE","ACTIONTYPE","ACTIONNAME","FIELDLABEL","ORDERNUM","COLSPAN" FROM "PUSHENG"."BS_F_AQ_SJ_RW_FML"' --fetchSize=4000 -d oracle
connStr:  oracle://xxxxx:xxxxx@172.16.0.198:55446/orcl?prefetch_rows=4000
sqlCmd:  SELECT /*+ PARALLEL(BS_F_AQ_SJ_RW_FML,2) */ "ID","BASEID","BASESCHEMA","PHASENO","FILEDID","FILEDTYPE","FILEDVALUE","MODIFYDATE","DEALERID","DEALER","TASKID","FIELDCODE","ACTIONTYPE","ACTIONNAME","FIELDLABEL","ORDERNUM","COLSPAN" FROM "BS_F_AQ_SJ_RW_FML"
ID                             NCHAR                         
BASEID                         NCHAR                         
BASESCHEMA                     NCHAR                         
PHASENO                        NCHAR                         
FILEDID                        NCHAR                         
FILEDTYPE                      NCHAR                         
FILEDVALUE                     OCIClobLocator                
MODIFYDATE                     NUMBER                        
DEALERID                       NCHAR                         
DEALER                         NCHAR                         
TASKID                         NCHAR                         
FIELDCODE                      NCHAR                         
ACTIONTYPE                     NCHAR                         
ACTIONNAME                     NCHAR                         
FIELDLABEL                     NCHAR                         
ORDERNUM                       NUMBER                        
COLSPAN                        NUMBER                        
         0 rows exported 2022-10-20 11:31:22.051276204 +0800 CST m=+1.846314719 
     20000 rows exported 2022-10-20 11:31:29.449538251 +0800 CST m=+9.244576776 
fetchSize: 4000 Rows: 20000 Time: 7398ms Avg: 2703.3258167462623/s

godror

go run main.go --host=172.16.0.198 --port
=55446 --username=xxxxxx --password=xxxxx --sql='SELECT /*+ PARALLEL(BS_F_AQ_SJ_RW_FML,2) */ "ID","BASEID","BASESCHEMA","PHASENO","FILEDID","FILEDTYPE","FILEDVALUE","MODIFYDATE","DEALERID","DEALER","TASKID","FIELDCODE","ACTIONTYPE","ACTIONNAME","FIELDLABEL","ORDERNUM","COLSPAN" FROM "PUSHENG"."BS_F_AQ_SJ_RW_FML"' --fetchSize=4000 
connStr:  user=xxxxx password=xxxxxx connectString="(DESCRIPTION=(CONNECT_TIMEOUT = 30)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.0.198)(PORT=55446)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcl)))"
configDir= connectionClass= enableEvents=0 externalAuth=0 heterogeneousPool=0
libDir= newPassword= noTimezoneCheck=0 poolIncrement=0 poolMaxSessions=0 poolMinSessions=0
poolSessionMaxLifetime=0s poolSessionTimeout=0s poolWaitTimeout=0s prelim=0
standaloneConnection=1 sysasm=0 sysdba=0 sysoper=0 timezone=
godror WARNING: discrepancy between DBTIMEZONE ("+00:00"=0) and SYSTIMESTAMP ("+08:00"=800) - set connection timezone, see https://github.com/godror/godror/blob/master/doc/timezone.md
sqlCmd:  SELECT /*+ PARALLEL(BS_F_AQ_SJ_RW_FML,2) */ "ID","BASEID","BASESCHEMA","PHASENO","FILEDID","FILEDTYPE","FILEDVALUE","MODIFYDATE","DEALERID","DEALER","TASKID","FIELDCODE","ACTIONTYPE","ACTIONNAME","FIELDLABEL","ORDERNUM","COLSPAN" FROM "BS_F_AQ_SJ_RW_FML"
ID                             VARCHAR2                      
BASEID                         VARCHAR2                      
BASESCHEMA                     VARCHAR2                      
PHASENO                        VARCHAR2                      
FILEDID                        VARCHAR2                      
FILEDTYPE                      VARCHAR2                      
FILEDVALUE                     CLOB                          
MODIFYDATE                     NUMBER                        
DEALERID                       VARCHAR2                      
DEALER                         VARCHAR2                      
TASKID                         VARCHAR2                      
FIELDCODE                      VARCHAR2                      
ACTIONTYPE                     VARCHAR2                      
ACTIONNAME                     VARCHAR2                      
FIELDLABEL                     VARCHAR2                      
ORDERNUM                       NUMBER                        
COLSPAN                        NUMBER                        
         0 rows exported 2022-10-20 11:31:40.167840141 +0800 CST m=+0.036582563 
     20000 rows exported 2022-10-20 11:31:40.411254976 +0800 CST m=+0.279997398 
fetchSize: 4000 Rows: 20000 Time: 243ms Avg: 82151.27742874552/s

test code

package main

import (
        "context"
        "database/sql"
        "fmt"
        "github.com/godror/godror"
        "github.com/godror/godror/dsn"
        "github.com/jessevdk/go-flags"

        // _ "github.com/mattn/go-oci8"
        _ "github.com/sijms/go-ora/v2"
        "os"
        "strconv"
        "time"
)

type Args struct {
        // General generic options
        Help       bool   `short:"h" long:"help" description:"Displays hele"`
        DBHost     string `short:"H" long:"host" description:"database host ip" default:"127.0.0.1"`
        DBPort     int    `short:"p" long:"port" description:"database port" default:"1521"`
        DBUser     string `short:"u" long:"username" description:"database username" default:"system"`
        DBUserPwd  string `short:"P" long:"password" description:"database username password" default:"oracle"`
        DBName     string `short:"n" long:"dbName" description:"database name" default:"orcl"`
        FetchSize  int    `short:"f" long:"fetchSize" description:"fetchSize" default:"2000"`
        TableName  string `short:"t" long:"tabName" description:"tabName" default:""`
        SQL        string `short:"s" long:"sql" description:"select sql" default:""`
        DriverName string `short:"d" long:"driverName" description:"Golang Oracle DriverName. oracle. godror. oci8" default:"godror"`
        Cpu        bool   `short:"c" long:"cpu" description:"cpu pprof"`
}

func main() {
        var (
                c = &Args{}
        )

        parser := flags.NewNamedParser("ora_test", flags.None)
        parser.ShortDescription = "Golang Oracle Driver Test Perf "
        parser.LongDescription = "Golang Oracle Driver Test Perf"

        if _, err := parser.AddGroup("General options", "", c); err != nil {
                fmt.Println(err)
                return
        }
        _, err := parser.ParseArgs(os.Args[1:])
        if err != nil {
                fmt.Println(err)
                return
        }
        if c.Help {
                printUsage(parser)
                return
        }
        if c.DriverName == "" {
                fmt.Println("DriverName error ")
                return
        }
        if c.TableName == "" && c.SQL == "" {
                fmt.Println("TableName and SQL not define")
                return
        }
        run(c)
}

func printUsage(parser *flags.Parser) {
        // parser.WriteManPage(os.Stdout)
        parser.WriteHelp(os.Stdout)
}

func run(c *Args) {

        var dbUrl string

        if c.DriverName == "oracle" {
                dbUrl = fmt.Sprintf("oracle://%s:%s@%s:%v/%s", c.DBUser, c.DBUserPwd, c.DBHost, c.DBPort, c.DBName)
                if c.FetchSize > 0 {
                        dbUrl += "?prefetch_rows=" + strconv.Itoa(c.FetchSize)
                }
        } else if c.DriverName == "oci8" {
                dbUrl = fmt.Sprintf("oracle://%s:%s@%s:%v/%s", c.DBUser, c.DBUserPwd, c.DBHost, c.DBPort, c.DBName)
                if c.FetchSize > 0 {
                        dbUrl += "?prefetch_rows=" + strconv.Itoa(c.FetchSize)
                }
        } else {
                cfg := godror.ConnectionParams{
                        CommonParams: dsn.CommonParams{},
                        ConnParams:   dsn.ConnParams{},
                        PoolParams:   dsn.PoolParams{},
                        // NewPassword:          dsn.Password{},
                        StandaloneConnection: true,
                }
                if cfg.Username == "" {
                        cfg.Username = c.DBUser
                }
                if cfg.Password.Len() == 0 {
                        cfg.Password = godror.NewPassword(c.DBUserPwd)
                }
                if cfg.ConnectString == "" {
                        cfg.ConnectString = fmt.Sprintf(
                                "(DESCRIPTION=(CONNECT_TIMEOUT = %v)(ADDRESS_LIST="+
                                        "(ADDRESS=(PROTOCOL=TCP)(HOST=%s)(PORT=%v)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=%s)))",
                                30, c.DBHost, c.DBPort, c.DBName,
                        )
                }
                dbUrl = cfg.StringWithPassword()
        }
        fmt.Println("connStr: ", dbUrl)
        db, err := sql.Open(c.DriverName, dbUrl)
        if err != nil {
                fmt.Println(err)
                return
        }
        var rowNum int64
        tx, err := db.Begin()
        if err != nil {
                fmt.Println(err)
                return
        }
        defer tx.Commit() // nolint
        var sqlCmd string

        if c.TableName != "" {
                sqlCmd = fmt.Sprintf("select * from %s", c.TableName)
        } else {
                sqlCmd = c.SQL
        }
        fmt.Println("sqlCmd: ", sqlCmd)
        var (
                rows *sql.Rows
        )
        if c.DriverName == "oracle" || c.DriverName == "oci8" {
                rows, err = tx.QueryContext(context.Background(), sqlCmd)
        } else {
                rows, err = tx.QueryContext(context.Background(), sqlCmd, godror.PrefetchCount(c.FetchSize), godror.FetchArraySize(c.FetchSize))
        }

        if err != nil {
                fmt.Println(err)
                return
        }
        defer rows.Close()
        cols, err := rows.ColumnTypes()
        if err != nil {
                fmt.Println(err)
                return
        }
        for _, col := range cols {
                fmt.Printf("%-30s %-30s\n", col.Name(), col.DatabaseTypeName())
        }
        bTime := time.Now()
        fmt.Printf("%10v rows exported %s \n", rowNum, time.Now().String())
        values := make([]*string, len(cols))
        binds := make([]interface{}, len(cols))
        for i := range values {
                binds[i] = &values[i]
        }
        for rows.Next() {
                if err := rows.Scan(binds...); err != nil {
                }
                rowNum++
                if rowNum%500000 == 0 {
                        fmt.Printf("%10v rows exported %s \n", rowNum, time.Now().String())
                }
        }
        if err = rows.Err(); err != nil {
                fmt.Println(err)
                return
        }
        fmt.Printf("%10v rows exported %s \n", rowNum, time.Now().String())
        eTime := time.Now()
        fmt.Printf("fetchSize: %04v Rows: %v Time: %vms Avg: %v/s\n", c.FetchSize, rowNum, eTime.Sub(bTime).Milliseconds(), float64(rowNum)/eTime.Sub(bTime).Seconds())
}
sijms commented 2 years ago

as I can see from comparison there is very big difference and probable answer for this, is the way each package dealing with lob in case of original C driver the row data collected as usual and each lob is returned from database as lob locator structure which is few bytes and delay reading the actual blob data until you try to read it on the other hand go-ora will read blob data for each rows (20,000) and I think this is the reason

cjbj commented 2 years ago

godror's current default is to fetch as string/[]byte, which is much faster then getting a Lob locator . This works up to 1 GB.

Oracle Database LOBs in general also support 'LOB prefetching' (different to row prefetching), something the LOB team at Oracle like, but the fetch-as-string/buffer approach seems just as good in practice for ODPI-C -based drivers like godror. I did add lob prefetching to PHP OCI8 because it let me do minimal driver changes and not destabilize an old code base.

sijms commented 2 years ago

yes thanks @cjbj for your explanation I will review these Caps in your project and fix this issue

travelliu commented 2 years ago

TKS。 Through the network packet capture, it was indeed lob prefet. database version 11.2.0.4

Gordor

2022-10-23 at 14 22

go-ora

2022-10-23 at 14 24

izxero commented 1 year ago

Any update?

sijms commented 1 year ago

i am working on it

sijms commented 1 year ago

fixed in v2.6.2

this changes I make mainly in the command.go file I take from C# driver the speed is dramatically increase after that changes but still I receive lob locator from the server after receiving data. the presentation as follow 1- send query request to the server 2- server will response with column definitions (no data) 3- send another request with define 4- server will return data for lobs server will send <data><locator> and so on

@cjbj you talk about 'LOB prefetching' and 'fetch-as-string/buffer approach' would you explain the sequence above is related to which of them

this approach is different from old one as follow: in old one data is sent in step 2 and no need to step 3 and 4

but data lob data will return as <locator> only and you need to make a network call to get data and size

sijms commented 1 year ago

@travelliu could you test now and see the difference

travelliu commented 1 year ago

@travelliu could you test now and see the difference

OK. i testing

travelliu commented 1 year ago

@sijms

driver select rows time
go-ora 20000 206ms
godror 20000 260ms

Go-ora

         0 rows exported 2023-03-18 12:26:26.720312976 +0800 CST m=+0.045739054 
     20000 rows exported 2023-03-18 12:26:26.92704968 +0800 CST m=+0.252475758 
fetchSize: 4000 Rows: 20000 Time: 206ms Avg: 96726.45561031434/s

godror

         0 rows exported 2023-03-18 12:26:08.973837443 +0800 CST m=+0.035773967 
     20000 rows exported 2023-03-18 12:26:09.23424223 +0800 CST m=+0.296178764 
fetchSize: 4000 Rows: 20000 Time: 260ms Avg: 76792.01451513443/s
cjbj commented 1 year ago

Were you fetching as string, or as lob locators? The former will be a lot faster than the latter.

sijms commented 1 year ago

I think we can now close this issue