matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.77k stars 274 forks source link

[Bug]: increservice may fill duplicated number in multi-CN #16493

Open jensenojs opened 4 months ago

jensenojs commented 4 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

lastest, or before ed279e605

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

drop table if exists t0;
create table t0(a int auto_increment, b int, primary key(a));
insert into t0(b) values (1), (2);
-- @session:id=1{
begin;
use insert_auto_pk;
insert into t0(a) values (3);

delete from t0 where a=2;
insert into t0(b) values (1), (2);

commit;
-- @session}

insert into t0(b) values (1), (2); -- Duplicate entry '3' for key 'a'
image

Expected Behavior

No response

Steps to Reproduce

https://github.com/matrixorigin/ci-test/actions/runs/9284561402

把bvt的其他的测试删了, 用ci-test来测试, 跑bvt特定cases的耗时只要4s即可

Additional information

No response

jensenojs commented 4 months ago

最早出现这个问题的是这个commit, 可能要请旭哥帮忙看一下

相关link : https://github.com/matrixorigin/ci-test/actions/runs/9296838465/job/25586288265

commit 9f73da6152b77d8f7f060f77e0e0b20e7aa1cade
Author: fagongzi <zhangxu19830126@gmail.com>
Date:   Fri May 10 21:32:31 2024 +0800

    improve incr service pre allocate (#15995)

    improve incr service pre allocate.

    Approved by: @m-schen
image
jensenojs commented 4 months ago
image

目前诊断结论是这样的, 初步地解决方案是, 如果是insert, 且这个insertauto pk插入值的话, 那么这个事务可以允许重试

jensenojs commented 3 months ago

重新讨论了一下, 上述方案的主要问题是会依赖于TN的去重. 再开会向旭哥请教了一下, incrservice可以结合下面这个api换一种在不牺牲正确性的前提下又能避免CN去重的方案.

对于

create table t(a int auto-increment primary key, b int);
insert into t(b) values (1);

对于这种类型的sql, 在preinsert算子在生成了数据之后, 可以利用上述API检测CN的incr-cache区间的值是否被修改过了

jensenojs commented 2 months ago

@aressu1985 相关pr以及bvt已经合并入main, 请动哥有空的时候看看有没有必要将相关用例进一步集成

jensenojs commented 2 months ago
2024-07-16 18:29:31 Start to load data from file tpcc_10/cust-hist.csv into table tpcc_10.bmsql_history,please wait.....
2024-07-16 18:29:31 load data url s3option {'endpoint'='cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='tpcc_10/cust-hist.csv', 'compression'=''} into table tpcc_10.bmsql_history fields terminated by ',' lines terminated by '\n' parallel 'true';
2024-07-16 18:29:34 ERROR 20101 (HY000) at line 1: internal error: txn workspace is nil
2024-07-16 18:29:34 The data for table tpcc_10.bmsql_history has failed to be loaded.
2024-07-16 18:29:34 This test for [s3_bmsql_history_tpcc_10w] has been executed failed, more info, please see the log

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%225RM%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-checkin-regression-17381%5C%22%7D%20%7C%3D%20%60txn%20workspace%20is%20nil%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221721154571000%22,%22to%22:%221721154600000%22%7D%7D%7D&schemaVersion=1&orgId=1

jensenojs commented 2 months ago

等这个issue给fix之后

jensenojs commented 2 months ago

https://github.com/matrixorigin/matrixone/issues/17581#issuecomment-2244822456

jensenojs commented 6 days ago

明天找旭哥确认一下