ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
4.02k stars 588 forks source link

Try to retry by ActorUnknown reason #10892

Open azevaykin opened 1 month ago

azevaykin commented 1 month ago

Fails only under sanitize=address

Test run https://storage.yandexcloud.net/ydb-gh-logs/ydb-platform/ydb/PR-check/11515160536/ya-x86-64-asan/try_1/ya-test.html#FAIL

Test ydb/core/sys_view/ut_kqp/SystemView.TopPartitionsFollowers Падает при чтении из таблицы тут https://github.com/ydb-platform/ydb/blob/bb2d8c411d97f1c14e207e5a9d1e92a8e19436f0/ydb/core/sys_view/ut_kqp.cpp#L1425

Провел еще эксперимент. Если тут заменить WHERE Key = 1 на WHERE Key <= 1, то стабильно падает https://github.com/ydb-platform/ydb/blob/bb2d8c411d97f1c14e207e5a9d1e92a8e19436f0/ydb/core/sys_view/ut_kqp.cpp#L1412

stderr https://storage.yandexcloud.net/ydb-gh-logs/ydb-platform/ydb/PR-check/11515160536/ya-x86-64-asan/try_1/artifacts/logs/ydb/core/sys_view/ut_kqp/test-results/unittest/testing_out_stuff/SystemView.TopPartitionsFollowers.err

2024-10-25T09:58:57.687094Z node 6 :KQP_EXECUTER ERROR: TxId: 281474976715665. Ctx: { TraceId: 01jb1htqn69hrby154g1bcgxc0, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=6&id=ZjhiZTkzNzUtNDg0ZGY1YjUtMThkOTMzOTUtNDBhZGJlMjM=, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Retry failed by retries limit, requestId: 0
2024-10-25T09:58:57.687164Z node 6 :KQP_EXECUTER ERROR: TxId: 281474976715665. Ctx: { TraceId: 01jb1htqn69hrby154g1bcgxc0, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=6&id=ZjhiZTkzNzUtNDg0ZGY1YjUtMThkOTMzOTUtNDBhZGJlMjM=, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Retry failed because all nodes are busy, requestId: 0
2024-10-25T09:58:57.687261Z node 6 :KQP_EXECUTER ERROR: ActorId: [6:7429650619097076413:4284] TxId: 281474976715665. Ctx: { TraceId: 01jb1htqn69hrby154g1bcgxc0, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=6&id=ZjhiZTkzNzUtNDg0ZGY1YjUtMThkOTMzOTUtNDBhZGJlMjM=, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. { <main>: Error: TEvKqpNode::TEvStartKqpTasksRequest lost: ActorUnknown }
2024-10-25T09:58:57.728278Z node 6 :KQP_COMPUTE ERROR: SelfId: [6:7429650619097076418:4284], TxId: 281474976715665, task: 2. Ctx: { SessionId : ydb://session/3?node_id=6&id=ZjhiZTkzNzUtNDg0ZGY1YjUtMThkOTMzOTUtNDBhZGJlMjM=.  TraceId : 01jb1htqn69hrby154g1bcgxc0.  CustomerSuppliedId : .  CurrentExecutionId : .  DatabaseId : /Root.  PoolId : default.  Database : . }. InternalError: INTERNAL_ERROR DEFAULT_ERROR: { <main>: Error: Terminate execution }.

Если включить TRACE, видно, что ошибка начинается с момента 2024-10-25T13:20:58.353671Z

... SELECT from follower
2024-10-25T13:20:57.720994Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:57.827801Z node 4 :TX_DATASHARD TRACE: StateWork, received event# 2146435079, Sender [0:0:0], Recipient [4:7429702658074694292:18]: NKikimr::NDataShard::TDataShard::TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:57.828025Z node 4 :TX_DATASHARD DEBUG: UpdateTableStats at datashard 72075186224037894
2024-10-25T13:20:57.845287Z node 4 :TX_DATASHARD DEBUG: SendPeriodicTableStats at datashard 72075186224037894, for tableId 2, but no stats yet
2024-10-25T13:20:57.846834Z node 4 :TX_DATASHARD DEBUG: BuildStats result at datashard 72075186224037894, for tableId 2: RowCount 0, DataSize 0, IndexSize 0, PartCount 0, LoadedSize 0, Spent{0.000s wa 0.000s cnt 0}, HistogramKeys 0
2024-10-25T13:20:57.847373Z node 4 :TX_DATASHARD TRACE: StateWork, received event# 2146435080, Sender [4:7429702679549530991:1], Recipient [4:7429702658074694292:18]: NKikimr::NDataShard::TDataShard::TEvPrivate::TEvAsyncTableStats
2024-10-25T13:20:57.847515Z node 4 :TX_DATASHARD DEBUG: BuildStats result received at datashard 72075186224037894, for tableId 2
2024-10-25T13:20:57.847646Z node 4 :TX_DATASHARD DEBUG: SendPeriodicTableStats register new pipe at datashard 72075186224037894 FollowerId 0, TableInfos size = 1
2024-10-25T13:20:57.847932Z node 4 :TX_DATASHARD TRACE: TEvPeriodicTableStats from datashard 72075186224037894, FollowerId 0, tableId 2
2024-10-25T13:20:57.849090Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 269877761, Sender [4:7429702679549530993:8268], Recipient [4:7429702653779726914:8]: NKikimr::TEvTabletPipe::TEvServerConnected
2024-10-25T13:20:57.849190Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvTabletPipe::TEvServerConnected
2024-10-25T13:20:57.849252Z node 4 :FLAT_TX_SCHEMESHARD TRACE: Pipe server connected, at tablet: 72075186224037888
2024-10-25T13:20:57.851075Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 269553162, Sender [4:7429702658074694292:18], Recipient [4:7429702653779726914:8]: NKikimrTxDataShard.TEvPeriodicTableStats DatashardId: 72075186224037894 TableLocalId: 2 Generation: 1 Round: 0 TableStats { DataSize: 656 RowCount: 3 IndexSize: 0 InMemSize: 656 LastAccessTime: 1729862457701 LastUpdateTime: 1729862457130 ImmediateTxCompleted: 1 PlannedTxCompleted: 1 TxRejectedByOverload: 0 TxRejectedBySpace: 0 TxCompleteLagMsec: 0 InFlightTxCount: 0 RowUpdates: 3 RowDeletes: 0 RowReads: 1 RangeReads: 0 PartCount: 0 RangeReadRows: 0 SearchHeight: 1 LastFullCompactionTs: 0 HasLoanedParts: false ByKeyFilterSize: 0 } TabletMetrics { Memory: 82288 } ShardState: 2 UserTablePartOwners: 72075186224037894 NodeId: 4 StartTime: 1729862452031 TableOwnerId: 72075186224037888 FollowerId: 0
2024-10-25T13:20:57.851184Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvDataShard::TEvPeriodicTableStats
2024-10-25T13:20:57.851401Z node 4 :FLAT_TX_SCHEMESHARD INFO: Got periodic table stats at tablet 72075186224037888 from shard 72075186224037894 followerId 0 pathId [OwnerId: 72075186224037888, LocalPathId: 2] state 'Ready' dataSize 656 rowCount 3 cpuUsage 0
2024-10-25T13:20:57.852338Z node 4 :FLAT_TX_SCHEMESHARD TRACE: Got periodic table stats at tablet 72075186224037888 from shard 72075186224037894 followerId 0 pathId [OwnerId: 72075186224037888, LocalPathId: 2] raw table stats:
DataSize: 656 RowCount: 3 IndexSize: 0 InMemSize: 656 LastAccessTime: 1729862457701 LastUpdateTime: 1729862457130 ImmediateTxCompleted: 1 PlannedTxCompleted: 1 TxRejectedByOverload: 0 TxRejectedBySpace: 0 TxCompleteLagMsec: 0 InFlightTxCount: 0 RowUpdates: 3 RowDeletes: 0 RowReads: 1 RangeReads: 0 PartCount: 0 RangeReadRows: 0 SearchHeight: 1 LastFullCompactionTs: 0 HasLoanedParts: false ByKeyFilterSize: 0
2024-10-25T13:20:57.852543Z node 4 :FLAT_TX_SCHEMESHARD TRACE: Will delay TTxStoreTableStats on# 0.099957s, queue# 1
2024-10-25T13:20:57.852881Z node 4 :TX_DATASHARD TRACE: StateWork, received event# 269877760, Sender [4:7429702679549530992:4475], Recipient [4:7429702658074694292:18]: NKikimr::TEvTabletPipe::TEvClientConnected
2024-10-25T13:20:57.852961Z node 4 :TX_DATASHARD TRACE: StateWork, processing event TEvTabletPipe::TEvClientConnected
2024-10-25T13:20:57.888994Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [4:7429702658074694366:4116], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:57.890082Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253770:4118], Recipient [5:7429702662203253788:18]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:57.890727Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:57.952992Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 2146435088, Sender [0:0:0], Recipient [4:7429702653779726914:8]: NKikimr::NSchemeShard::TEvPrivate::TEvPersistTableStats
2024-10-25T13:20:57.953103Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvPrivate::TEvPersistTableStats
2024-10-25T13:20:57.953172Z node 4 :FLAT_TX_SCHEMESHARD INFO: Started TEvPersistStats at tablet 72075186224037888, queue size# 1
2024-10-25T13:20:57.953338Z node 4 :FLAT_TX_SCHEMESHARD TRACE: Will execute TTxStoreStats, queue# 1
2024-10-25T13:20:57.953425Z node 4 :FLAT_TX_SCHEMESHARD TRACE: Will delay TTxStoreTableStats on# 0.000000s, queue# 1
2024-10-25T13:20:57.953670Z node 4 :FLAT_TX_SCHEMESHARD DEBUG: PersistSingleStats for pathId 2 shard idx 72075186224037888:7 data size 656 row count 3
2024-10-25T13:20:57.953943Z node 4 :FLAT_TX_SCHEMESHARD DEBUG: TTxStoreTableStats.PersistSingleStats: main stats from datashardId(TabletID)=72075186224037894 maps to shardIdx: 72075186224037888:7 followerId=0, pathId: [OwnerId: 72075186224037888, LocalPathId: 2], pathId map=Table1, is column=0, is olap=0
2024-10-25T13:20:57.954073Z node 4 :FLAT_TX_SCHEMESHARD INFO: Add stats from shard with datashardId(TabletID)=72075186224037894 followerId=0, pathId 2: RowCount 3, DataSize 656
2024-10-25T13:20:57.954143Z node 4 :FLAT_TX_SCHEMESHARD TRACE: BuildStatsForCollector: datashardId 72075186224037894, followerId 0
2024-10-25T13:20:57.954449Z node 4 :FLAT_TX_SCHEMESHARD TRACE: background compaction enqueue/update shard# 72075186224037888:7 with partCount# 0, rowCount# 3, searchHeight# 1, lastFullCompaction# 1970-01-01T00:00:00.000000Z at schemeshard 72075186224037888
2024-10-25T13:20:57.954927Z node 4 :FLAT_TX_SCHEMESHARD TRACE: TSideEffects ApplyOnComplete at tablet# 72075186224037888
2024-10-25T13:20:57.955400Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 2146435088, Sender [0:0:0], Recipient [4:7429702653779726914:8]: NKikimr::NSchemeShard::TEvPrivate::TEvPersistTableStats
2024-10-25T13:20:57.955480Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvPrivate::TEvPersistTableStats
2024-10-25T13:20:57.955532Z node 4 :FLAT_TX_SCHEMESHARD INFO: Started TEvPersistStats at tablet 72075186224037888, queue size# 0
2024-10-25T13:20:57.956485Z node 4 :SYSTEM_VIEWS TRACE: TEvSysView::TEvSendPartitionStats: domainKey [OwnerId: 72057594046644480, LocalPathId: 2] pathId [OwnerId: 72075186224037888, LocalPathId: 2] shardIdx 72075186224037888 7 followerId 0 stats DataSize: 656 RowCount: 3 IndexSize: 0 CPUCores: 0 TabletId: 72075186224037894 NodeId: 4 StartTime: 1729862452031 AccessTime: 1729862457701 UpdateTime: 1729862457130 InFlightTxCount: 0 RowUpdates: 3 RowDeletes: 0 RowReads: 1 RangeReads: 0 RangeReadRows: 0 ImmediateTxCompleted: 1 PlannedTxCompleted: 1 TxRejectedByOverload: 0 TxRejectedBySpace: 0 ByKeyFilterSize: 0 FollowerId: 0
2024-10-25T13:20:57.984933Z node 1 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 271125000, Sender [0:0:0], Recipient [1:7429702599886806964:12326]: NKikimr::NSchemeShard::TEvSchemeShard::TEvWakeupToMeasureSelfResponseTime
2024-10-25T13:20:57.985090Z node 1 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvSchemeShard::TEvWakeupToMeasureSelfResponseTime
2024-10-25T13:20:57.985541Z node 1 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 271124999, Sender [1:7429702599886806964:12326], Recipient [1:7429702599886806964:12326]: NKikimr::NSchemeShard::TEvSchemeShard::TEvMeasureSelfResponseTime
2024-10-25T13:20:57.985676Z node 1 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvSchemeShard::TEvMeasureSelfResponseTime
2024-10-25T13:20:58.005747Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxCollect::Execute
2024-10-25T13:20:58.006040Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistPartitionTopResults: table id# 17, partition interval end# 2024-10-25T13:20:58.000000Z, partition count# 0
2024-10-25T13:20:58.006249Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistPartitionTopResults: table id# 18, partition interval end# 2024-10-25T14:00:00.000000Z, partition count# 0
2024-10-25T13:20:58.007033Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] Reset: interval end# 2024-10-25T13:20:58.000000Z
2024-10-25T13:20:58.019655Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxCollect::Complete
2024-10-25T13:20:58.056264Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [4:7429702658074694366:4116], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.057180Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253770:4118], Recipient [5:7429702662203253788:18]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.057794Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.070492Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 2146435079, Sender [0:0:0], Recipient [5:7429702662203253786:2045]: NKikimr::NDataShard::TDataShard::TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.070667Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, processing event TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.070832Z node 5 :TX_DATASHARD DEBUG: UpdateTableStats at datashard 72075186224037894
2024-10-25T13:20:58.078078Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 2146435079, Sender [0:0:0], Recipient [5:7429702662203253788:18]: NKikimr::NDataShard::TDataShard::TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.078244Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, processing event TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.078390Z node 5 :TX_DATASHARD DEBUG: UpdateTableStats at datashard 72075186224037894
2024-10-25T13:20:58.089919Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 2146435079, Sender [0:0:0], Recipient [4:7429702658074694391:2043]: NKikimr::NDataShard::TDataShard::TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.090084Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, processing event TEvPrivate::TEvPeriodicWakeup
2024-10-25T13:20:58.090223Z node 4 :TX_DATASHARD DEBUG: UpdateTableStats at datashard 72075186224037894
2024-10-25T13:20:58.128265Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 271125000, Sender [0:0:0], Recipient [4:7429702653779726914:8]: NKikimr::NSchemeShard::TEvSchemeShard::TEvWakeupToMeasureSelfResponseTime
2024-10-25T13:20:58.128430Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvSchemeShard::TEvWakeupToMeasureSelfResponseTime
2024-10-25T13:20:58.128870Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, received event# 271124999, Sender [4:7429702653779726914:8], Recipient [4:7429702653779726914:8]: NKikimr::NSchemeShard::TEvSchemeShard::TEvMeasureSelfResponseTime
2024-10-25T13:20:58.128984Z node 4 :FLAT_TX_SCHEMESHARD TRACE: StateWork, processing event TEvSchemeShard::TEvMeasureSelfResponseTime
2024-10-25T13:20:58.134651Z node 4 :SYSTEM_VIEWS DEBUG: NSysView::TPartitionStatsCollector: TEvProcessOverloaded top size# 1, time# 2024-10-25T13:20:58.134397Z
2024-10-25T13:20:58.135900Z node 4 :SYSTEM_VIEWS TRACE: TEvSysView::TEvSendTopPartitions:  record Partitions { TabletId: 72075186224037894 Path: "/Root/Tenant1/Table1" PeakTimeUs: 1729862458134397 CPUCores: 0 NodeId: 4 DataSize: 656 RowCount: 3 IndexSize: 0 InFlightTxCount: 0 FollowerId: 0 } TimeUs: 1729862458134397
2024-10-25T13:20:58.136134Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxTopPartitions::Execute: partition count# 1
2024-10-25T13:20:58.147004Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxTopPartitions::Complete
2024-10-25T13:20:58.223611Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [4:7429702658074694366:4116], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.224474Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253770:4118], Recipient [5:7429702662203253788:18]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.225089Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.323042Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 0. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Bootstrap done, become ReadyState
2024-10-25T13:20:58.323245Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Executing physical tx, type: 2, stages: 2
2024-10-25T13:20:58.323439Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Got request, become WaitResolveState
2024-10-25T13:20:58.323641Z node 1 :KQP_EXECUTER DEBUG: StageInfo: StageId #[0,0], InputsCount: 1, OutputsCount: 1
2024-10-25T13:20:58.323759Z node 1 :KQP_EXECUTER DEBUG: StageInfo: StageId #[0,1], InputsCount: 1, OutputsCount: 1
2024-10-25T13:20:58.324357Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Resolved key sets: 1
2024-10-25T13:20:58.325170Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Resolved key: { TableId: [OwnerId: 72075186224037888, LocalPathId: 2] Access: 1 SyncVersion: false Status: OkData Kind: KindRegularTable PartitionsCount: 1 DomainInfo { DomainKey: [OwnerId: 72057594046644480, LocalPathId: 2] ResourcesDomainKey: [OwnerId: 72057594046644480, LocalPathId: 2] Params { Version: 2 PlanResolution: 50 Coordinators: 72075186224037889 Coordinators: 72075186224037890 TimeCastBucketsPerMediator: 2 Mediators: 72075186224037891 Mediators: 72075186224037892 SchemeShard: 72075186224037888 SysViewProcessor: 72075186224037893 } ServerlessComputeResourcesMode: (empty maybe) } From: (Uint64 : NULL) IncFrom: 1 To: () IncTo: 0 }
2024-10-25T13:20:58.325484Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Stage [0,0] AST: (
(return (lambda '($1) (FromFlow (ExpandMap (Take (ToFlow $1) (Uint64 '"1001")) (lambda '($2) (Member $2 '"Key"))))))
)
2024-10-25T13:20:58.326381Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Stage [0,1] AST: (
(return (lambda '($1) (FromFlow (NarrowMap (Take (ToFlow $1) (Uint64 '"1001")) (lambda '($2) (AsStruct '('"Key" $2)))))))
)
2024-10-25T13:20:58.326736Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Stage [0,1] create compute task: 2
2024-10-25T13:20:58.326944Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Stage [0,1] create channelId: 1 from task: 1 to task: 2 of type UnionAll/Map without spilling
2024-10-25T13:20:58.327073Z node 1 :KQP_EXECUTER DEBUG: Create result channelId: 2 from task: 2 with index: 0
2024-10-25T13:20:58.327358Z node 1 :KQP_EXECUTER TRACE: [ShardsResolver] TxId: 281474976710665. Send request about tabletId: 72075186224037894
2024-10-25T13:20:58.329699Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 269877761, Sender [4:7429702683844498310:8238], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTabletPipe::TEvServerConnected
2024-10-25T13:20:58.329903Z node 1 :KQP_EXECUTER TRACE: [ShardsResolver] TxId: 281474976710665. Got resolve event for tabletId: 72075186224037894, nodeId: 4
2024-10-25T13:20:58.329983Z node 1 :KQP_EXECUTER DEBUG: [ShardsResolver] TxId: 281474976710665. Shard resolve complete, resolved shards: 1
2024-10-25T13:20:58.329868Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, processing event TEvTabletPipe::TEvServerConnected
2024-10-25T13:20:58.330210Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Shards nodes resolved, success: 1, failed: 0
2024-10-25T13:20:58.330256Z node 4 :TX_DATASHARD DEBUG: Server connected at follower 3 tablet# 72075186224037894, clientId# [1:7429702685786154072:8491], serverId# [4:7429702683844498310:8238], sessionId# [0:0:0]
2024-10-25T13:20:58.330546Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Shards on nodes:
  node 4: [72075186224037894]
2024-10-25T13:20:58.330815Z node 1 :KQP_EXECUTER ERROR: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Database not set, use /Root
2024-10-25T13:20:58.331136Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Total tasks: 1, readonly: true, 1 scan tasks on 1 nodes, pool: Data, localComputeTasks: 0, snapshot: {0, 0}
2024-10-25T13:20:58.333862Z node 1 :KQP_EXECUTER INFO: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Total tasks: 2, readonly: 1, datashardTxs: 0, evWriteTxs: 0, topicTxs: 0, volatile: 0, immediate: 1, pending compute tasks1, useFollowers: 1
2024-10-25T13:20:58.334082Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Updating channels after the creation of compute actors
2024-10-25T13:20:58.334351Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Collect channels updates for task: 2 at actor [1:7429702685786154073:4432]
2024-10-25T13:20:58.334673Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Task: 2, input channelId: 1, src task: 1, at actor [0:0:0]
2024-10-25T13:20:58.335050Z node 1 :KQP_EXECUTER TRACE: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Sending channels info to compute actor: [1:7429702685786154073:4432], channels: 1
2024-10-25T13:20:58.335414Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Waiting for: CT 1, CA [1:7429702685786154073:4432],
2024-10-25T13:20:58.335776Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. ActorState: WaitResolveState, waiting for 1 compute actor(s) and 0 datashard(s): CA [1:7429702685786154073:4432],
2024-10-25T13:20:58.336043Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. ActorState: WaitResolveState, immediate tx, become ExecuteState
2024-10-25T13:20:58.341784Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Schedule a retry by ActorUnknown reason, nodeId:4 requestId: 0
2024-10-25T13:20:58.342359Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. ActorState: ExecuteState, got execution state from compute actor: [1:7429702685786154073:4432], task: 2, state: COMPUTE_STATE_EXECUTING, stats: {  }
2024-10-25T13:20:58.342603Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Waiting for: CT 1, CA [1:7429702685786154073:4432],
2024-10-25T13:20:58.342833Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. ActorState: ExecuteState, waiting for 1 compute actor(s) and 0 datashard(s): CA [1:7429702685786154073:4432],
2024-10-25T13:20:58.353671Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Try to retry by ActorUnknown reason, nodeId: 4, requestId: 0
2024-10-25T13:20:58.355743Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Schedule a retry by ActorUnknown reason, nodeId:4 requestId: 0
2024-10-25T13:20:58.376762Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Try to retry by ActorUnknown reason, nodeId: 4, requestId: 0
2024-10-25T13:20:58.378666Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Schedule a retry by ActorUnknown reason, nodeId:4 requestId: 0
2024-10-25T13:20:58.392825Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [4:7429702658074694366:4116], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.393677Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253770:4118], Recipient [5:7429702662203253788:18]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.394261Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.419859Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Try to retry by ActorUnknown reason, nodeId: 4, requestId: 0
2024-10-25T13:20:58.421499Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Schedule a retry by ActorUnknown reason, nodeId:4 requestId: 0
2024-10-25T13:20:58.505056Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Try to retry by ActorUnknown reason, nodeId: 4, requestId: 0
2024-10-25T13:20:58.506992Z node 1 :KQP_EXECUTER DEBUG: ActorId: [1:7429702685786154068:4432] TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Schedule a retry by ActorUnknown reason, nodeId:4 requestId: 0
2024-10-25T13:20:58.530364Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxAggregate::Execute
2024-10-25T13:20:58.530519Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryResults: interval end# 2024-10-25T13:20:58.000000Z, query count# 0
2024-10-25T13:20:58.530623Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 8, interval end# 2024-10-25T13:20:58.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.530720Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 10, interval end# 2024-10-25T13:20:58.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.530809Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 12, interval end# 2024-10-25T13:20:58.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.530898Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 14, interval end# 2024-10-25T13:20:58.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.530990Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 9, interval end# 2024-10-25T14:00:00.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.531084Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 11, interval end# 2024-10-25T14:00:00.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.531174Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 13, interval end# 2024-10-25T14:00:00.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.531283Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] PersistQueryTopResults: table id# 15, interval end# 2024-10-25T14:00:00.000000Z, query count# 0, persisted# 0
2024-10-25T13:20:58.541098Z node 4 :SYSTEM_VIEWS DEBUG: [72075186224037893] TTxAggregate::Complete
2024-10-25T13:20:58.555924Z node 4 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [4:7429702658074694366:4116], Recipient [4:7429702658074694391:2043]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.556882Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253770:4118], Recipient [5:7429702662203253788:18]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.557554Z node 5 :TX_DATASHARD TRACE: StateWorkAsFollower, received event# 268828680, Sender [5:7429702662203253773:4119], Recipient [5:7429702662203253786:2045]: NKikimr::TEvTablet::TEvFUpdate
2024-10-25T13:20:58.692741Z node 1 :KQP_EXECUTER DEBUG: TxId: 281474976710665. Ctx: { TraceId: 01jb1xcpcw8hvyaq7g9wkt7028, Database: , DatabaseId: /Root, SessionId: ydb://session/3?node_id=1&id=ZjBiMTFkYS0xNmI1MDkyMC01MDcxYzE2NS1iM2VjMzBhOA==, CurrentExecutionId: , CustomerSuppliedId: , PoolId: default}. Try to retry by ActorUnknown reason, nodeId: 4, requestId: 0

Полный TRACE log https://gist.github.com/azevaykin/ade36ff2e2705c726ae5f19ff2cf7bb4

azevaykin commented 1 month ago

Исходный тест починился с помощью https://github.com/ydb-platform/ydb/pull/10974.

Но проблема осталась: Если тут заменить WHERE Key = 1 на WHERE Key <= 1, то стабильно падает https://github.com/ydb-platform/ydb/blob/bb2d8c411d97f1c14e207e5a9d1e92a8e19436f0/ydb/core/sys_view/ut_kqp.cpp#L1412