Open hzxa21 opened 5 months ago
@yezizp2012 @StrikeW We may need to check whether this is a bug in cdc source or in meta. If there is a race in actor assignment in meta, it may affect other use cases as well.
I met the same issue after I rebooted the risingwave cluster. Error log is found in the compactor pod:
thread 'rw-compaction' panicked at /risingwave/src/storage/hummock_sdk/src/key.rs:1066:21: key UserKey { 246, TableKey { 00000001312d30000000000003 } } epoch EpochWithGap(6317061581963264) >= prev epoch EpochWithGap(6317061581963264) 2024-04-20T15:34:02.883393802Z INFO risingwave_storage::hummock::compactor::compactor_runner: Ready to handle compaction group 2 task: 147635 compact_task_statistics CompactTaskStatistics { total_file_count: 44, total_key_count: 46, total_file_size: 25004, total_uncompressed_file_size: 24590 } target_level 0 compression_algorithm 0 table_ids [76, 86, 96, 106, 111, 131, 141, 196, 201, 226, 231, 246, 256] parallelism 1
Another occurrence that is probably related to this issue: https://risingwave-community.slack.com/archives/C03BW71523T/p1719592780560509
Another occurance in v1.10.0-rc3. But because the cluster has already been reset, we don't know the kind of problematic table.
Describe the bug
Recently there are two user reporting the following assertion triggered in compaction both on the data related to cdc source state table: https://github.com/risingwavelabs/risingwave/blob/9f2ac7e06d82f03c94dcb3db549cd1c8a9ccdb8d/src/storage/hummock_sdk/src/key.rs#L1069
Here are some info for the two user reports:
v..7.2
: brand new cluster and compactor panics after table creation. No more information provided.v1.8.0
: brand new cluster and compactor panics after table creation (using dedicated cdc source)..... 2 | 0 | 6258387423461376 | 23250 | 788 2 | 0 | 6258387423461376 | 23240 | 788 ....
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response