vrogier / ocilib

OCILIB (C and C++ Drivers for Oracle) - Open source C and C++ library for accessing Oracle databases
http://www.ocilib.net
Apache License 2.0
322 stars 116 forks source link

Direct Path Load write to oracle error #260

Open duoduo-peng opened 3 years ago

duoduo-peng commented 3 years ago

When I use multi thread to write data to Oracle database, I report an error.

if (!OCI_Initialize(err_handler, nullptr, OCI_ENV_DEFAULT | OCI_ENV_THREADED)) { return 1; }

0 0x00007f6ca2719387 in raise () from /lib64/libc.so.6

1 0x00007f6ca5d4566f in skgesigOSCrash () from /u01/client/lib/libclntsh.so.21.1

2 0x00007f6ca64335ed in kpeDbgSignalHandler () from /u01/client/lib/libclntsh.so.21.1

3 0x00007f6ca5d45952 in skgesig_sigactionHandler () from /u01/client/lib/libclntsh.so.21.1

4

5 0x00007f6ca68d5d6d in kpuhhfreV1 () from /u01/client/lib/libclntsh.so.21.1

6 0x00007f6ca39b9a65 in kpufdesc2 () from /u01/client/lib/libclntsh.so.21.1

7 0x00007f6ca39bcf0d in kpufdesc () from /u01/client/lib/libclntsh.so.21.1

8 0x00007f6ca32eed7a in OcilibDirPathSetColumn () from /usr/local/lib/libocilib.so.4

9 0x000000000054f0aa in createBomOrgItemWorker (thread=0x44599e8, data=0x445a3f0) at /home/duoduo/Desktop/project/db/YinwuLoadData.cpp:1792

10 0x00007f6ca1b20ea5 in start_thread () from /lib64/libpthread.so.0

11 0x00007f6ca27e196d in clone () from /lib64/libc.so.6

vrogier commented 3 years ago

Hi,

Can you more details? Ocilib and oracle version? Code snippet to reproduce the issue?

Regards,

Vincent

vrogier commented 3 years ago

Hi

Can you provide more info ? DirectPath operations are not meant to be made in //

Regards,

Vincent

duoduo-peng commented 3 years ago

ocilib is v4.7.2 Oracle is 19C

this is demo

include

include

include

include "ocilib.h"

include "fmt/format.h"

define SIZE_COL 22

define MAX_CONN 30

define OCI_DIRPATH_INDEX_MAINT_SKIP_ALL 4

struct BomOrgItemParStruct { int planId; int instanceId; int orgTable; int index; OCI_ConnPool pOciCoon; std::unordered_map<long, bool> bomOrgItemMap; };

void createBomOrgItemWorker(OCI_Thread thread, void data) { std::cout << "** Create BomOrgItem Worker *****" << std::endl;

clock_t start, end;
start = clock();

auto *bomOrgItemPar = (BomOrgItemParStruct *) data;
if (bomOrgItemPar == nullptr)
    return;

auto *bomOrgItemMap = bomOrgItemPar->bomOrgItemMap;
if (bomOrgItemMap == nullptr || bomOrgItemMap->empty())
    return;

boolean res = TRUE;
OCI_DirPath *dp = nullptr;
OCI_TypeInfo *tbl = nullptr;
OCI_Connection *pOciCoon = OCI_PoolGetConnection(bomOrgItemPar->pOciCoon, nullptr);

printf("pOciCoon = %p \n", pOciCoon);

//LogInfo("OCI_Connection *pOciCoon = OCI_PoolGetConnection(bomOrgItemPar->pOciCoon, nullptr)");

uint32_t batchRows = bomOrgItemMap->size() > 20000 ? 20000 : bomOrgItemMap->size();
std::string partition = "PLAN_ID_10001";
std::string bomOrgLTableName = fmt::format("yinwu_mrp_bom_litem_tmp{}", bomOrgItemPar->index);
std::string bomOrgQTableName = fmt::format("yinwu_mrp_bom_qitem_tmp{}", bomOrgItemPar->index);
std::string tableName =
        bomOrgItemPar->orgTable == 1 ? bomOrgLTableName : bomOrgQTableName;

tbl = OCI_TypeInfoGet(pOciCoon, tableName.c_str(), OCI_TIF_TABLE);
dp = OCI_DirPathCreate(tbl, partition.c_str(), 3, batchRows);

/* optional attributes to set */
uint32_t bufferSize = 64000000;// batchRows < 20000 ? 6400000 : 1024 * 1024 * 1024;
OCI_DirPathSetBufferSize(dp, bufferSize);
OCI_DirPathSetConvertMode(dp, OCI_DCM_DEFAULT);
OCI_DirPathSkipIndex(dp, OCI_DIRPATH_INDEX_MAINT_SKIP_ALL);
OCI_DirPathSetNoLog(dp, TRUE);
OCI_DirPathSetParallel(dp, TRUE);

/* describe the target table */
OCI_DirPathSetColumn(dp, 1, "PLAN_ID", SIZE_COL, NULL);
OCI_DirPathSetColumn(dp, 2, "INSTANCE_ID", SIZE_COL, NULL);
OCI_DirPathSetColumn(dp, 3, "ITEM_ID", SIZE_COL, NULL);

char val1[SIZE_COL + 1] = {0};
char val2[SIZE_COL + 1] = {0};
char val3[SIZE_COL + 1] = {0};

/* prepare the load */
OCI_DirPathPrepare(dp);
batchRows = OCI_DirPathGetMaxRows(dp);
//LogInfo("batchRows={0}, bomOrgItemMap->size={1}", batchRows, bomOrgItemMap->size());

for (auto iter = bomOrgItemMap->begin(); iter != bomOrgItemMap->end();) {
    OCI_DirPathReset(dp);

    for (uint j = 1; j <= batchRows && iter != bomOrgItemMap->end(); ++j) {
        snprintf(val1, SIZE_COL, "%d", bomOrgItemPar->planId);
        snprintf(val2, SIZE_COL, "%d", bomOrgItemPar->instanceId);
        snprintf(val3, SIZE_COL, "%ld", iter->first);

        OCI_DirPathSetEntry(dp, j, 1, val1, (unsigned int) strlen(val1), TRUE);
        OCI_DirPathSetEntry(dp, j, 2, val2, (unsigned int) strlen(val2), TRUE);
        OCI_DirPathSetEntry(dp, j, 3, val3, (unsigned int) strlen(val3), TRUE);

        ++iter;
    }

    /* load data to the server */
    while (res) {
        uint state = OCI_DirPathConvert(dp);
        if (state == OCI_DPR_ERROR) {
            std::cout << "BomOrgItem OCI_DirPathConvert Error" << std::endl;
            return;
        }
        if ((state == OCI_DPR_FULL) || (state == OCI_DPR_COMPLETE))
            res = OCI_DirPathLoad(dp);

        if (state == OCI_DPR_COMPLETE)
            break;
    }
}

OCI_DirPathFinish(dp);
OCI_DirPathFree(dp);//free direct path object
OCI_ConnectionFree(pOciCoon);

std::string rebuildIndex = fmt::format("ALTER INDEX {0}_N1 REBUILD PARTITION PLAN_ID_{1} NOLOGGING PARALLEL",
                                       tableName, bomOrgItemPar->planId);
OCI_Statement *pStmt;
pOciCoon = OCI_PoolGetConnection(bomOrgItemPar->pOciCoon, nullptr);
pStmt = OCI_StatementCreate(pOciCoon);
OCI_ExecuteStmt(pStmt, rebuildIndex.c_str());

OCI_StatementFree(pStmt);
OCI_ConnectionFree(pOciCoon);

end = clock();
std::cout << "Create BomOrgItem Worker Total Time = {0} s " << (float) (end - start) / CLOCKS_PER_SEC << std::endl;

}

bool createBomOrgItem(int orgTable, std::unordered_map<long, bool> *bomOrgItemMapPoint) { clock_t start, end; start = clock();

std::vector<OCI_Thread *> threadList;
std::vector<BomOrgItemParStruct> bomOrgItemParList;

int bomComTableCnt = 8;
threadList.resize(bomComTableCnt);
bomOrgItemParList.resize(bomComTableCnt);

std::string dbServer = "192.168.2.117:1521/devpdb.yinwu.com";
OCI_ConnPool *pCoonPool = OCI_PoolCreate(dbServer.c_str(), "dev", "dev",
                                         OCI_POOL_SESSION, OCI_SESSION_DEFAULT, 0, MAX_CONN, 1);

int i = 0;

OCI_Statement *pStmt;
OCI_Connection *pOciCoon = OCI_PoolGetConnection(pCoonPool, nullptr);
pStmt = OCI_StatementCreate(pOciCoon);

for (int j = 0; j < bomComTableCnt; ++j) {
    std::string sql = fmt::format(
            "ALTER TABLE YINWU_MRP_BOM_LITEM_TMP{} TRUNCATE PARTITION PLAN_ID_10001 UPDATE GLOBAL INDEXES", j);
    OCI_ExecuteStmt(pStmt, sql.c_str());
}

OCI_StatementFree(pStmt);
OCI_ConnectionFree(pOciCoon);

//BomOrgItemParStruct bomOrgItem[8] = {};
for (i = 0; i < bomComTableCnt; i++) {
    bomOrgItemParList[i].index = i;
    bomOrgItemParList[i].orgTable = orgTable;
    bomOrgItemParList[i].planId = 10001;
    bomOrgItemParList[i].instanceId = 10001;
    bomOrgItemParList[i].pOciCoon = pCoonPool;
    bomOrgItemParList[i].bomOrgItemMap = &bomOrgItemMapPoint[i];

    threadList[i] = OCI_ThreadCreate();
    OCI_ThreadRun(threadList[i], createBomOrgItemWorker, &bomOrgItemParList[i]);
}

for (i = 0; i < bomComTableCnt; i++) {
    OCI_ThreadJoin(threadList[i]);
    OCI_ThreadFree(threadList[i]);
}

end = clock();
float useTimes = (float) (end - start) / CLOCKS_PER_SEC;
std::cout << "Create BomOrgItem Total Time = {0} s " << useTimes << std::endl;

}

void err_handler(OCI_Error *err) { printf("%s\n", OCI_ErrorGetString(err)); }

int main() { if (!OCI_Initialize(err_handler, nullptr, OCI_ENV_DEFAULT | OCI_ENV_THREADED)) { return 1; }

std::unordered_map<long, bool> bomOrgItemMapPoint[8];

for (int i = 0; i < 8; ++i) {
    auto *map = &bomOrgItemMapPoint[i];
    for (int j = 0; j < 3000; ++j) {
        map->insert(std::make_pair(j, true));
    }
}

createBomOrgItem(1, bomOrgItemMapPoint);

OCI_Cleanup();

std::cout<<"Sucessfully"<<std::endl;
return 0;

}

vrogier commented 3 years ago

Hi,

Looking at the error stack strace, the issue occurs when calling oracle client method from OcilibDirPathSetColumn(). The only Oracle client calls made from that method are: OCIAttrGet(), OCIParamGet(), OCIDescriptorFree() Might be more related to internal handling of concurrent access in Oracle client code of these methods. Have you tried with a different Oracle Client ?

Regards,

Vincent

duoduo-peng commented 3 years ago

When I change the client to instantclient_19_10, report the following error

Errors in file : OCI-21500: internal error code, arguments: [17099], [0x7F15975A9D10], [0x7F15989AB068], [0x002687E20], [], [], [], []

----- Call Stack Trace ----- calling call entry argument values in hex
location type point (? means dubious value)


skgudmp()+154 call kgdsdst() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgeriv_int()+169 call skgudmp() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgeriv()+30 call kgeriv_int() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgesiv()+117 call kgeriv() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgesic3()+142 call kgesiv() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kge_report_17099()+ call kgesic3() 000000000 ? 000000000 ? 551 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpuhhfreV1()+739 call kge_report_17099() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc2()+2867 call kpuhhfreV1() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc()+45 call kpufdesc2() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? OcilibColumnRetriev call kpufdesc() 000000000 ? 000000000 ? eInfo()+268 000000000 ? 000000000 ? 000000000 ? 000000000 ?

Call stack signature: 0xef1a174a6a31dfe5

----- Kernel Stack Trace -----

----- End Kernel Stack Trace -----

call stack performance statistics: total : 0.010000 sec setup : 0.000000 sec stack unwind : 0.000000 sec symbol translation : 0.010000 sec printing the call stack: 0.000000 sec printing frame data : 0.000000 sec printing argument data : 0.000000 sec printing kernel stack : 0.000000 sec

----- End of Call Stack Trace -----

Errors in file : OCI-21500: internal error code, arguments: [kgepop: no error frame to pop to], [], [], [], [], [], [], [] OCI-21500: internal error code, arguments: [17099], [0x7F15975A9D10], [0x7F15989AB068], [0x002687E20], [], [], [], []

----- Call Stack Trace ----- calling call entry argument values in hex
location type point (? means dubious value)


skgudmp()+154 call kgdsdst() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgerinv_internal()+ call skgudmp() 000000000 ? 000000000 ? 89 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgerinv()+40 call kgerinv_internal() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgerin()+130 call kgerinv() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgepop()+858 call kgerin() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgersel()+256 call kgepop() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kge_report_17099()+ call kgersel() 000000000 ? 000000000 ? 863 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpuhhfreV1()+739 call kge_report_17099() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc2()+2867 call kpuhhfreV1() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc()+45 call kpufdesc2() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? OcilibColumnRetriev call kpufdesc() 000000000 ? 000000000 ? eInfo()+268 000000000 ? 000000000 ? 000000000 ? 000000000 ?

Call stack signature: 0x7d0d324aae6fcc4a

----- Kernel Stack Trace -----

----- End Kernel Stack Trace -----

call stack performance statistics: total : 0.000000 sec setup : 0.000000 sec stack unwind : 0.000000 sec symbol translation : 0.000000 sec printing the call stack: 0.000000 sec printing frame data : 0.000000 sec printing argument data : 0.000000 sec printing kernel stack : 0.000000 sec

----- End of Call Stack Trace -----

Errors in file : OCI-21500: internal error code, arguments: [kgepop: no error frame to pop to], [], [], [], [], [], [], [] OCI-21500: internal error code, arguments: [17099], [0x7F15975A9D10], [0x7F15989AB068], [0x002687E20], [], [], [], []

----- Call Stack Trace ----- calling call entry argument values in hex
location type point (? means dubious value)


skgudmp()+154 call kgdsdst() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgepop()+936 call skgudmp() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kgersel()+256 call kgepop() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kge_report_17099()+ call kgersel() 000000000 ? 000000000 ? 863 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpuhhfreV1()+739 call kge_report_17099() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc2()+2867 call kpuhhfreV1() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kpufdesc()+45 call kpufdesc2() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? OcilibColumnRetriev call kpufdesc() 000000000 ? 000000000 ? eInfo()+268 000000000 ? 000000000 ? 000000000 ? 000000000 ?

Call stack signature: 0x6cefee83252783f1

----- Kernel Stack Trace -----

----- End Kernel Stack Trace -----

call stack performance statistics: total : 0.010000 sec setup : 0.000000 sec stack unwind : 0.000000 sec symbol translation : 0.010000 sec printing the call stack: 0.000000 sec printing frame data : 0.000000 sec printing argument data : 0.000000 sec printing kernel stack : 0.000000 sec

----- End of Call Stack Trace -----

Process finished with exit code 1

vrogier commented 3 years ago

Have you tried with a much smaller buffer size?

You use: uint32_t bufferSize = 64000000;// batchRows < 20000 ? 6400000 : 1024 1024 1024;

64mo or 1Go is way too big anyway for such operations!!! And also batching 20000 rows is also too big and counter productive.

duoduo-peng commented 3 years ago

Thank you very much

I use uint32_t batchRows = pItems->size() > 1000 ? 1000 : pItems->size(); OCI_DirPathSetBufferSize(dp, 64000000);

The error message is as follows Errors in file : OCI-21500: internal error code, arguments: [17099], [0x000000000], [0x7FFFDD03C8A8], [0x0007EED00], [], [], [], []

----- Call Stack Trace ----- calling call entry argument values in hex
location type point (? means dubious value)


Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdd04a700 (LWP 32798)] 0x00007ffff6cd8756 in slaac_int () from /u01/client/instantclient_19_10/libclntsh.so.19.1 (gdb) bt

0 0x00007ffff6cd8756 in slaac_int () from /u01/client/instantclient_19_10/libclntsh.so.19.1

1 0x00007ffff6cd864a in slrac () from /u01/client/instantclient_19_10/libclntsh.so.19.1

2 0x00007ffff5f77baf in kgdsdaaddr () from /u01/client/instantclient_19_10/libclntsh.so.19.1

3 0x00007ffff5f7697e in kgdsdst () from /u01/client/instantclient_19_10/libclntsh.so.19.1

4 0x00007ffff501d35a in skgudmp () from /u01/client/instantclient_19_10/libclntsh.so.19.1

5 0x00007ffff5f285f9 in kgeriv_int () from /u01/client/instantclient_19_10/libclntsh.so.19.1

6 0x00007ffff5f2aa1e in kgeriv () from /u01/client/instantclient_19_10/libclntsh.so.19.1

7 0x00007ffff5f2b205 in kgesiv () from /u01/client/instantclient_19_10/libclntsh.so.19.1

8 0x00007ffff5f2aeae in kgesic3 () from /u01/client/instantclient_19_10/libclntsh.so.19.1

9 0x00007ffff5f28197 in kge_report_17099 () from /u01/client/instantclient_19_10/libclntsh.so.19.1

10 0x00007ffff6bbef3c in kpuhhaloV1 () from /u01/client/instantclient_19_10/libclntsh.so.19.1

11 0x00007ffff3e168d2 in kpugdesc2 () from /u01/client/instantclient_19_10/libclntsh.so.19.1

12 0x00007ffff3e16593 in kpugdesc () from /u01/client/instantclient_19_10/libclntsh.so.19.1

13 0x00007ffff6bcf306 in kpugattr () from /u01/client/instantclient_19_10/libclntsh.so.19.1

14 0x00007ffff3772eee in OcilibDirPathSetColumn () from /usr/local/lib/libocilib.so.4

15 0x00000000004c4c83 in createGroupBomByOciWorker (thread=0x0, data=0x7fffdd03e870) at /home/duoduo/CLionProjects/YinwuMgpCoreApp/db/YinwuWriteData.cpp:264

16 0x000000000049a588 in YinwuPlanCalculate::lambda31::operator() (closure=0xe28900, jobs=..., conn=0xe35bf8)

at /home/duoduo/CLionProjects/YinwuMgpCoreApp/plan/core/YinwuPlanCalculate.cpp:1201

17 0x000000000049be87 in std::_Function_handler<void(YinwuJobQueue&, OCI_Connection), YinwuPlanCalculate::writeGroupBomByThd(OCI_Pool)::__lambda31>::_M_invoke(const std::_Any_data &, YinwuJobQueue &, OCI_Connection *) (functor=..., args#0=..., __args#1=0xe35bf8) at /usr/include/c++/4.8.2/functional:2071

18 0x000000000048ceed in std::function<void (YinwuJobQueue&, OCI_Connection)>::operator()(YinwuJobQueue&, OCI_Connection) const (this=0x7fffffffd770, args#0=..., args#1=0xe35bf8)

at /usr/include/c++/4.8.2/functional:2471

19 0x000000000048b5ca in lambda25::operator() (closure=0x0, thread=0xa13f88, data=0x7fffffffd4d0) at /home/duoduo/CLionProjects/YinwuMgpCoreApp/util/YinwuJobQueue.cpp:151

20 0x000000000048b5f1 in __lambda25::_FUN (thread=0xa13f88, data=0x7fffffffd4d0) at /home/duoduo/CLionProjects/YinwuMgpCoreApp/util/YinwuJobQueue.cpp:152

21 0x00007ffff1fa5ea5 in start_thread () from /lib64/libpthread.so.0

22 0x00007ffff2c6596d in clone () from /lib64/libc.so.6

(gdb)

The actual amount of data is very small,the data written by each thread is less than 20 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 13 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 22 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 17 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 18 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:216] ** Create GroupBom By Oci Worker * [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 17 [2021-02-23 14:08:30.383] [console] [info] [YinwuWriteData.cpp:228] pBoms->size() = 12 Errors in file : OCI-21500: internal error code, arguments: [17099], [0x000000000], [0x7FFFDD03C8A8], [0x0007EED00], [], [], [], []

----- Call Stack Trace ----- calling call entry argument values in hex
location type point (? means dubious value)


Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdd04a700 (LWP 32798)] 0x00007ffff6cd8756 in slaac_int () from /u01/client/instantclient_19_10/libclntsh.so.19.1

vrogier commented 3 years ago

Hi,

OCI_DirPathSetBufferSize(dp, 64000000); ==> this is way to big.

Have you tried a much smaller size ?

vrogier commented 3 years ago

Any news?