zettadb / kunlun

KunlunBase is a distributed relational database management system(RDBMS) with complete NewSQL capabilities and robust transaction ACID guarantees and is compatible with standard SQL. Applications which used PostgreSQL or MySQL can work with KunlunBase as-is without any code change or rebuild because KunlunBase supports both PostgreSQL and MySQL connection protocols and DML SQL grammars. MySQL DBAs can quickly work on a KunlunBase cluster because we use MySQL as storage nodes of KunlunBase. KunlunBase can elastically scale out as needed, and guarantees transaction ACID under error conditions, and KunlunBase fully passes TPC-C, TPC-H and TPC-DS test suites, so it not only support OLTP workloads but also OLAP workloads. Application developers can use KunlunBase to build IT systems that handles terabytes of data, without any effort on their part to implement data sharding, distributed transaction processing, distributed query processing, crash safety, high availability, strong consistency, horizontal scalability. All these powerful features are provided by KunlunBase. KunlunBase supports powerful and user friendly cluster management, monitor and provision features, can be readily used as DBaaS.
http://www.kunlunbase.com
Apache License 2.0
143 stars 20 forks source link

duplicate cached shard_node_t object of the same shard node #507

Open jd-zhang opened 3 years ago

jd-zhang commented 3 years ago

Issue migrated from trac ticket # 160

component: computing nodes | priority: major

2021-09-15 16:29:02: @david-zhao created the issue


Happens in the global deadlock detector process occasionally during TPCC test runs.

0 0x00007fb0678cf37f in raise () from /lib64/libc.so.6

1 0x00007fb0678b9db5 in abort () from /lib64/libc.so.6

2 0x0000000000a59cc4 in ExceptionalCondition (conditionName=0xb38e40 "!(ref->ptr == noderef->ptr)", errorType=0xb38d8e "FailedAssertion", fileName=0xb38d80 "pg_sharding.c", lineNumber=129)

at assert.c:54

3 0x00000000005bac3a in AddShard_node_ref_t (pshard=0x7fb0698a89d8, noderef=0x1f46dc0) at pg_sharding.c:129

4 0x00000000005bb12b in LoadAllShards (init=false) at pg_sharding.c:261

5 0x00000000005bbf1e in startShardCacheSeq (seq_status=0x7ffc341f22a0) at pg_sharding.c:634

6 0x0000000000576f09 in build_wait_for_graph () at remote_xact.c:1069

7 0x000000000057818e in perform_deadlock_detect () at remote_xact.c:1519

8 0x00007fb058372708 in global_deadlock_detector_main (main_arg=0) at global_deadlock_detector.c:177

9 0x000000000082ee75 in StartBackgroundWorker () at bgworker.c:829

10 0x0000000000842476 in do_start_bgworker (rw=0x1e8be70) at postmaster.c:5767

11 0x0000000000842810 in maybe_start_bgworkers () at postmaster.c:5981

12 0x000000000083eebc in reaper (postgres_signal_arg=17) at postmaster.c:2898

13

14 0x00007fb06798c25b in select () from /lib64/libc.so.6

15 0x000000000083cea1 in ServerLoop () at postmaster.c:1673

16 0x000000000083c875 in PostmasterMain (argc=3, argv=0x1e64d10) at postmaster.c:1382

17 0x000000000075f179 in main (argc=3, argv=0x1e64d10) at main.c:233

jd-zhang commented 3 years ago

2021-09-22 11:19:22: @david-zhao commented


Fix: reference the newly created Shard_node_t object after releasing the old one.

jd-zhang commented 3 years ago

2021-09-22 11:19:22: @david-zhao changed status from assigned to accepted