ossc-db / pg_bulkload

High speed data loading utility for PostgreSQL
http://ossc-db.github.io/pg_bulkload/index.html
Other
433 stars 75 forks source link

PARALLEL write data is wrong. #120

Open killerwzb opened 2 years ago

killerwzb commented 2 years ago

hi

 I use benchmarksql generate data and Consistency checking,but i find one problem if i use this opition "WRITE=PARALLEL ", the result is wrong. Can you help me ?
The following is my steps:
  1. use benchmarksql generate 1000 warehouse data。

    cat 1000.pg

    db=postgres driver=org.postgresql.Driver conn=jdbc:postgresql://localhost/benchmarksql? socketFactory=org.newsclub.net.unix.AFUNIXSocketFactory$FactoryArg&socketFactoryArg=/tmp/.s.PGSQL.5667 user=benchmarksql password=123456

    warehouses=200 loadWorkers=64

    terminals=800 //To run specified transactions per terminal- runMins must equal zero runTxnsPerTerminal=0 //To run for specified minutes- runTxnsPerTerminal must equal zero runMins=30 //Number of total transactions per minute limitTxnsPerMin=0

    //Set to true to run in 4.x compatible mode. Set to false to use the //entire configured database evenly. terminalWarehouseFixed=true

    //The following five values must add up to 100 //The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec newOrderWeight=45 paymentWeight=43 orderStatusWeight=4 deliveryWeight=4 stockLevelWeight=4

    // Directory name to create for collecting detailed result data. // Comment this out to suppress. /resultDirectory=myresult%tY-%tm-%td_%tH%tM%tS //osCollectorScript=./misc/os_collector_linux.py //osCollectorInterval=1 //osCollectorSSHAddr=user@dbhost //osCollectorDevices=net_enp0s3 blk_sda

    I use this command get some csv file in this path: ./runLoader.sh 1000.pg fileLocation /home/nbase/data/testdata/

2、load data into database

pg_bulkload -i /home/wzb/testdata/order.csv -O bmsql_oorder -l bmsql_oorder-bulkload.log -P bmsql_oorder-bad.log -o "TYPE=CSV" -o "NULL=NULL" -o "WRITER=PARALLEL" -d benchmarksql -U postgres -p 5667 pg_bulkload -i /home/wzb/testdata/order-line.csv -O bmsql_order_line -l bmsql_order_line-bulkload.log -P bmsql_order_line-bad.log -o "TYPE=CSV" -o "NULL=NULL" -o "WRITER=PARALLEL" -d benchmarksql -U postgres -p 5667

3、Consistency checking

(select o_w_id, o_d_id, sum(o_ol_cnt) from bmsql_oorder group by o_w_id, o_d_id) except (select ol_w_id, ol_d_id, count (ol_o_id) from bmsql_order_line group by ol_w_id, ol_d_id);

If this result is 0 rows,then it is right. but i get some rows .

4、if I change the "WRITER=PARALLEL" to "WRITER=DIRECT", i can get 0 rows.