waterguo / benchmarksql

BenchmarkSQL for MySQL, PostgreSQL and Oracle
13 stars 13 forks source link

delivery txn issue: failed on mysql #2

Closed lsyyff23 closed 5 years ago

lsyyff23 commented 5 years ago

I do a very simple test with 1 warehouse and 10 terminals , 3 mins , and found , if there are two terminals running "delivery transaction" concurrently, one may stuck at deliveryTransaction() {

...
do {
no_o_id = -1;

if (delivGetOrderId == null) {
   delivGetOrderId = conn.prepareStatement(
     "SELECT no_o_id FROM benchmarksql.new_order WHERE no_d_id = ?" +
     " AND no_w_id = ?" +
     " ORDER BY no_o_id ASC");
}
...
}while(no_o_id != -1 && !newOrderRemoved);
...
};

I think, term-1 and term-2 both get no_o_id=2101, and term-1 delete success, term-2 failed, so retart the do...while, and term-2 will never success delivDeleteNewOrder. this is my modify , and it runs well:


private int deliveryTransaction(int w_id, int o_carrier_id) 
{
...
try
{
    orderIDs = new int[10];
    for(d_id = 1; d_id <= 10; d_id++)
    {
        new_order.no_d_id = d_id;
        //and this flag
        boolean restart = false;
...
if (result > 0) {
  newOrderRemoved = true;
}
// break here 
else {
  restart = true;
  break;
}
...
while(no_o_id != -1 && !newOrderRemoved);
// do rollback then restart from district 1
if (restart) {
    d_id=1;
    conn.rollback();
    continue;
}
}
waterguo commented 5 years ago

Hi, lsyyff23,

The original code looks fine to me. Let's assume the isolation level is READ COMMITED. And there are two orders - 100, 200, and two terminals - T1 , T2.

time                         T1                                                   T2
-------------------------------------------------------------------------------------------------
9:00:00                  SELECT FROM new_order            SELECT FROM new_order
9:00:01                   T1 got order 100                           T2 got order 100
9:00:02                  DELETE FROM new_order            "DELETE FROM new order" is blocked by T1
9:00:03                  COMMIT
9:00:04                                                                         "DELETE FROM new order" return 0
9:00:05                                                                         repeat the loop
9:00:06                                                                         SELECT FROM new_order
9:00:07                                                                          T2 got 200, because 100 is deleted by T1
9:00:08                                                                         "DELETE FROM new order" return 1
9:00:09                                                                         COMMIT  

what do you think?

water

lsyyff23 commented 5 years ago

Hi, waterguo

I test on mariadb again, happens again: the isolation is

MariaDB [(none)]> select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.000 sec)

these are part of props.mysql

warehouses=1
terminals=10
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=20
//Number of total transactions per minute
limitTxnsPerMin=0

log4j.xml set log trace

add terminalMessage on jTPCCTermial.java

...
 434                       result = delivDeleteNewOrder.executeUpdate();
 435                       terminalMessage("");
 436                       terminalMessage("---Test---w_ " + w_id + ", d_ " + d_id + ", o_ " + no_o_id + ", res " + result);
...

do test: stuck again;

look into log/benchmarksql.log find no_o_id 2112 created :

# grep "Customer:  2112" benchmarksql.log  -rn
3859:TRACE  [2018-12-04 17:44:54.200]   Thread-0    Term-01,  Customer:  2112
14507:TRACE [2018-12-04 17:45:01.002]   Thread-0    Term-01,  Customer:  2112
57388:TRACE [2018-12-04 17:45:30.287]   Thread-0    Term-01,  Customer:  2112
478311:TRACE    [2018-12-04 17:48:45.846]   Thread-0    Term-01,  Customer:  2112
490539:TRACE    [2018-12-04 17:48:50.229]   Thread-0    Term-01,  Customer:  2112
538856:TRACE    [2018-12-04 17:49:06.727]   Thread-0    Term-01,  Customer:  2112
545189:TRACE    [2018-12-04 17:49:08.663]   Thread-0    Term-01,  Customer:  2112
922574:TRACE    [2018-12-04 17:51:22.950]   Thread-0    Term-01,  Customer:  2112
952728:TRACE    [2018-12-04 17:51:33.835]   Thread-0    Term-01,  Customer:  2112
1158693:TRACE   [2018-12-04 17:52:46.390]   Thread-0    Term-01,  Customer:  2112
1478311:TRACE   [2018-12-04 17:54:44.107]   Thread-0    Term-01,  Customer:  2112
2162362:TRACE   [2018-12-05 18:05:10.675]   Thread-4    Term-05,  Customer:  2112
2315408:TRACE   [2018-12-05 18:05:18.410]   Thread-4    Term-05,  Customer:  2112
4223518:TRACE   [2018-12-05 18:06:50.625]   Thread-3    Term-04,  Customer:  2112
4326533:TRACE   [2018-12-05 18:06:55.955]   Thread-3    Term-04,  Customer:  2112
5477038:TRACE   [2018-12-05 18:08:02.815]   Thread-3    Term-04,  Customer:  2112
5937756:TRACE   [2018-12-05 18:08:34.448]   Thread-3    Term-04,  Customer:  2112
6201948:TRACE   [2018-12-05 18:09:20.202]   Thread-3    Term-04,  Customer:  2112
6243845:TRACE   [2018-12-05 18:09:23.628]   Thread-3    Term-04,  Customer:  2112

and delivery txn are: Term-01 Term-10 are normal, but Term08 is abnormal

6794:TRACE  [2018-12-04 17:44:56.178]   Thread-0    Term-01, ---Test---w_ 6, d_ 1, o_ 2112, res 1
6796:TRACE  [2018-12-04 17:44:56.180]   Thread-0    Term-01, ---Test---w_ 6, d_ 2, o_ 2112, res 1
6798:TRACE  [2018-12-04 17:44:56.181]   Thread-0    Term-01, ---Test---w_ 6, d_ 3, o_ 2112, res 1
6800:TRACE  [2018-12-04 17:44:56.183]   Thread-0    Term-01, ---Test---w_ 6, d_ 4, o_ 2112, res 1
6802:TRACE  [2018-12-04 17:44:56.185]   Thread-0    Term-01, ---Test---w_ 6, d_ 5, o_ 2112, res 1
6804:TRACE  [2018-12-04 17:44:56.186]   Thread-0    Term-01, ---Test---w_ 6, d_ 6, o_ 2112, res 1
6806:TRACE  [2018-12-04 17:44:56.188]   Thread-0    Term-01, ---Test---w_ 6, d_ 7, o_ 2112, res 1
6808:TRACE  [2018-12-04 17:44:56.189]   Thread-0    Term-01, ---Test---w_ 6, d_ 8, o_ 2112, res 1
6810:TRACE  [2018-12-04 17:44:56.191]   Thread-0    Term-01, ---Test---w_ 6, d_ 9, o_ 2112, res 1
6812:TRACE  [2018-12-04 17:44:56.192]   Thread-0    Term-01, ---Test---w_ 6, d_ 10, o_ 2112, res 1

1525706:TRACE   [2018-12-05 18:04:36.668]   Thread-9    Term-10, ---Test---w_ 1, d_ 1, o_ 2112, res 1
1525724:TRACE   [2018-12-05 18:04:36.669]   Thread-9    Term-10, ---Test---w_ 1, d_ 2, o_ 2112, res 1
1525756:TRACE   [2018-12-05 18:04:36.671]   Thread-9    Term-10, ---Test---w_ 1, d_ 3, o_ 2112, res 1
1525853:TRACE   [2018-12-05 18:04:36.672]   Thread-9    Term-10, ---Test---w_ 1, d_ 4, o_ 2112, res 1
1525876:TRACE   [2018-12-05 18:04:36.674]   Thread-9    Term-10, ---Test---w_ 1, d_ 5, o_ 2112, res 1
1525945:TRACE   [2018-12-05 18:04:36.675]   Thread-9    Term-10, ---Test---w_ 1, d_ 6, o_ 2112, res 1
1525957:TRACE   [2018-12-05 18:04:36.677]   Thread-9    Term-10, ---Test---w_ 1, d_ 7, o_ 2112, res 1
1525969:TRACE   [2018-12-05 18:04:36.678]   Thread-9    Term-10, ---Test---w_ 1, d_ 8, o_ 2112, res 1
1525979:TRACE   [2018-12-05 18:04:36.680]   Thread-9    Term-10, ---Test---w_ 1, d_ 9, o_ 2112, res 1
1525993:TRACE   [2018-12-05 18:04:36.682]   Thread-9    Term-10, ---Test---w_ 1, d_ 10, o_ 2112, res 1

1526001:TRACE   [2018-12-05 18:04:36.682]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526009:TRACE   [2018-12-05 18:04:36.683]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526015:TRACE   [2018-12-05 18:04:36.684]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526021:TRACE   [2018-12-05 18:04:36.685]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526027:TRACE   [2018-12-05 18:04:36.685]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526033:TRACE   [2018-12-05 18:04:36.686]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
1526039:TRACE   [2018-12-05 18:04:36.687]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
...
6573646:TRACE   [2018-12-05 18:09:50.439]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573686:TRACE   [2018-12-05 18:09:50.441]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573735:TRACE   [2018-12-05 18:09:50.443]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573753:TRACE   [2018-12-05 18:09:50.445]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573771:TRACE   [2018-12-05 18:09:50.447]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573789:TRACE   [2018-12-05 18:09:50.450]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573843:TRACE   [2018-12-05 18:09:50.452]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573899:TRACE   [2018-12-05 18:09:50.454]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573917:TRACE   [2018-12-05 18:09:50.456]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0
6573966:TRACE   [2018-12-05 18:09:50.459]   Thread-7    Term-08, ---Test---w_ 1, d_ 1, o_ 2112, res 0

then Thread-7(Term-08) was endless-loop at the do {}while () ;

do i miss anything? or test it wrong ?

lsyyff23 commented 5 years ago

Hi, lsyyff23,

The original code looks fine to me. Let's assume the isolation level is READ COMMITED. And there are two orders - 100, 200, and two terminals - T1 , T2.

time                         T1                                                   T2
-------------------------------------------------------------------------------------------------
9:00:00                  SELECT FROM new_order            SELECT FROM new_order
9:00:01                   T1 got order 100                           T2 got order 100
9:00:02                  DELETE FROM new_order            "DELETE FROM new order" is blocked by T1
9:00:03                  COMMIT
9:00:04                                                                         "DELETE FROM new order" return 0
9:00:05                                                                         repeat the loop
9:00:06                                                                         SELECT FROM new_order
9:00:07                                                                          T2 got 200, because 100 is deleted by T1
9:00:08                                                                         "DELETE FROM new order" return 1
9:00:09                                                                         COMMIT  

what do you think?

water

hi, water I agree with this. but in my test , when thread7 and thread9 get no_o_id 2112 concurrently,and thread 7 delete failed , then thread7 would not get other no_o_id , that confused me .

waterguo commented 5 years ago

Hi, lsyyff23,

I suggest you try it with READ-COMMITED isolation. It is the default on most of the RDB today. REPEATABLE-READ potentially causes more dead locks than READ-COMMITED. Application designed with READ-COMMITED in mind will fail on REPEATABLE-READ. It not just bechmarksql, but happens to every other applications as well. Besides that, REPEATABLE-READ performs much worse on all main stream RDBs. It is not a popular choice to run benchmarks.

water

waterguo commented 5 years ago

BTW, you can change isolation in my.cnf using the below line

transaction-isolation=READ-COMMITTED

lsyyff23 commented 5 years ago

Hi, water transaction-isolation=READ-COMMITTED

this could work. the performance is close with REPEATABLE-READ.

Thank you very much.

waterguo commented 5 years ago

You welcome