Open Ariznawlll opened 1 week ago
原因已大致定位到.
今天恢复也有这个问题:步骤与issue中提到的基本一样
mysql> select git_version();
+---------------+
| git_version() |
+---------------+
| 29a0c5d |
+---------------+
1 row in set (0.00 sec)
快照读能读到数据:
PR is on the way!
PR 可能只解决了,导致这个问题的原因之一,但如果概率比较大,可能还有其他原因,需要加日志再复现下.
下午根据pitr恢复2T数据也报该错误
commit: 8d7e7b8
恢复执行的sql: restore from pitr p01 "2024-10-22 03:24:18.547701"
又完善了Log , 线上,线下同时在复现。 应该是其他原因,导致了这个问题.
复现步骤:
修改 以下配置: [tn.Ckp] flush-interval = "5s" min-count = 1 scan-interval = "5s" incremental-interval = "10s" global-min-count = 3
修改程序: gcPartitionStateTicker = 5 time.Second gcPartitionStateTimer = 90 time.Second
运行mo-service
运行 sql: 1>create table tpcc_1000.bmsql_order_line
2>load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='AKIDUtG3skpK1hK7BSoClmsDVegirATitKiD','secret_access_key'='pXGubPAxolknvyzsqEoRBteLzmbSH3pb','bucket'='mo-load-guangzhou-1308875761','filepath'='tpcc_1000/order-line.csv', 'compression'=''} into table tpcc_1000.bmsql_order_line fields terminated by ',' lines terminated by '\n' parallel 'true';
3> create snapshot 1; 4> drop database tpcc_1000; 5> restore account sys from snapshot sp01;
原因已定位,等待修复. 是 txn is stale 的错误,导致了报表找不到的错误. txn is stale 的原因是 partition state 的 minTs, start, end 的数据不一致导致.
又完善了Log , 线上,线下同时在复现。 应该是其他原因,导致了这个问题.
经过线下复现,原因就是第一个pr 所修复的,只是修复失败.
由Txn is stale 导致的 table not found 问题应该修复了,线下测试过好多次了. @Ariznawlll 请测试.
等待pr 合并
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
cf5296b
Other Environment Information
Actual Behavior
恢复了大约5h后报错 table does not exist
日志:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%223wf%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20241016%5C%22%7D%20%7C%3D%20%60sp01%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221729137600000%22,%22to%22:%221729155600000%22%7D%7D%7D&schemaVersion=1&orgId=1
快照读能读到数据:
Expected Behavior
No response
Steps to Reproduce
Additional information
No response