pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.2k stars 5.84k forks source link

TPCC workload error on v5.3.1 #34351

Closed dbsid closed 2 years ago

dbsid commented 2 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. restore 1k warehouse tpcc data
  2. run benchmarksql client with 600 threads

2. What did you expect to see? (Required)

no application error

3. What did you see instead (Required)

application error and some of the client connection broken.

[2022/05/01 03:51:56.243 +08:00] [ERROR] [2pc.go:1227] ["2PC commit result undetermined"] [error="tikv aborts txn: Error(Txn(Error(Mvcc(Error(PessimisticLockNotFound { start_ts: TimeStamp(432891049658286143), key: [116, 128, 0, 0, 0, 0, 0, 0, 60, 95, 114, 3, 128, 0, 0, 0, 0, 0, 2, 168, 3, 128, 0, 0, 0, 0, 0, 0, 10] })))))"] [rpcErr="no available connections"] [txnStartTS=432891049658286143]
[2022/05/01 03:51:56.243 +08:00] [ERROR] [conn.go:1083] ["result undetermined, close this connection"] [conn=915] [error="previous statement: INSERT INTO bmsql_order_line (    ol_o_id, ol_d_id, ol_w_id, ol_number,     ol_i_id, ol_supply_w_id, ol_quantity,     ol_amount, ol_dist_info) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?),(?, ?, ?, ?, ?, ?, ?, ?, ?) [arguments: (37386, 10, 680, 1, 3246, 680, 2, 87.54, \"CyxwBTzx7X8m15xVlQE1xxHU\", 37386, 10, 680, 2, 3479, 680, 5, 147.25, \"NFc9YzZSRjwJEFaJMAQPoEbH\", 37386, 10, 680, 3, 7791, 680, 1, 88.42, \"PSewsSb6xp1Bz4JER2MJIPW2\", 37386, 10, 680, 4, 15014, 680, 3, 269.54999999999995, \"RSTOwmApmqs3zR0SoAXckkvp\", 37386, 10, 680, 5, 17390, 680, 9, 609.1200000000001, \"P9r9QaUnKpkIlFrhAnBDpD7Y\", 37386, 10, 680, 6, 31927, 680, 8, 449.2, \"g3nK9RhRkIUgsR4dSYDSg73B\", 37386, 10, 680, 7, 45215, 680, 8, 92.08, \"CvlRKQFBYq8jQpRTwTojq26P\", 37386, 10, 680, 8, 58399, 680, 7, 258.3, \"xgqt6YQvGmO71DdtRiDRhgTK\", 37386, 10, 680, 9, 58535, 680, 3, 25.259999999999998, \"OXHaHvSxrkm59OSFRNimkiFA\", 37386, 10, 680, 10, 63671, 680, 3, 160.68, \"jaJqq3WzotDUXMFSaGn8cHKS\", 37386, 10, 680, 11, 74167, 680, 6, 109.38, \"comajakMboclVOfAv628Zqgp\", 37386, 10, 680, 12, 82998, 680, 8, 757.92, \"IHXydqw9SV93CSsVlFQ5YOOD\", 37386, 10, 680, 13, 95349, 680, 8, 487.6, \"LHqPxyuLDSpfmbKR2NuvF4IB\")]: [global:2]execution result undetermined"]

image

4. What is your TiDB version? (Required)

v5.3.1

cfzjywxk commented 2 years ago

Does it happen every time running the tpc-c test on v5.3.1?

dbsid commented 2 years ago

The problem cluster and workload is kept for debug, will share to you when we're back to work

cfzjywxk commented 2 years ago

The root cause of this is that the tikv-server gets oom killed continuously in the test environment and the batch client keeps creating stream connections at that time. For the error log above, if an RPC error happens to commit a transaction using the async mode, the user client may receive an undetermined error from tidb-server and the user connection would be closed.