Closed isaac-io closed 2 years ago
to introduce cf drop, use flag --clear_column_family_one_in . note this only works with: test_batches_snapshots=false expected_values_dir=None backup_one_in=0 checkpoint_one_in=0
The white box is failing.
Verification failed for column family 0 key 0000000000038053000000000000004F000000000000003B0000000000000009000000000000004F000000000000003E787878787878 (1638432): Unexpected value found No writes or ops? Verification failed :(
my command: time make db_stress; time CRASH_TEST_EXT_ARGS="--clear_column_family_one_in=5 --test_batches_snapshots=0 --expected_values_dir="" --backup_one_in=0 --checkpoint_one_in=0" make whitebox_crash_test; echo $?
Thanks! Can you also post the db_stress
command that the crash test executed when this failed?
Sure. BTW, the black box failed as well - for the same reason:
Whitebox
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=1 --async_io=1 --avoid_flush_during_recovery=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=4096 --bloom_bits=10 --bottommost_compression_type=lz4hc --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=4 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_max_dict_buffer_bytes=67108863 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --customopspercent=0 --data_block_hash_table_util_ratio=0.63 --data_block_index_type=1 --db=/dev/shm/rocksdb.j1e0/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --delpercent=1 --delrangepercent=5 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir= --experimental_mempurge_threshold=6.023655548032458 --fail_if_options_file_error=0 --file_checksum_impl=big --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=13 --index_type=2 --iterpercent=19 --key_len_percent_dist=14,8,6,1,8,27,11,11,8,6 --kill_exclude_prefixes=WritableFileWriter::Append,WritableFileWriter::WriteBuffered --kill_random_test=88889 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=10485760 --max_key_len=10 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=50 --num_iterations=91 --open_files=500000 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=23 --recycle_log_file_num=1 --reopen=20 --reserve_table_reader_memory=1 --ribbon_starting_level=3 --secondary_cache_fault_one_in=32 --seed=3062778522 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --subcompactions=1 --sync=False --sync_fault_injection=False --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=2 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=True --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=True --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=0 --writepercent=52
Blackbox
./db_stress --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_setting_blob_options_dynamically=1 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --blob_compaction_readahead_size=4194304 --blob_compression_type=lz4 --blob_file_size=268435456 --blob_garbage_collection_age_cutoff=0.5 --blob_garbage_collection_force_threshold=0.75 --block_size=16384 --bloom_bits=13 --bottommost_compression_type=none --cache_index_and_filter_blocks=1 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=5 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=0 --compaction_ttl=1 --compare_full_db_state_snapshot=0 --compression_max_dict_buffer_bytes=7 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --customopspercent=0 --data_block_hash_table_util_ratio=0.73 --data_block_index_type=1 --db=/dev/shm/rocksdb.qGR3/rocksdb_crashtest_blackbox --db_write_buffer_size=134217728 --delpercent=9 --delrangepercent=4 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=1 --enable_blob_garbage_collection=1 --enable_compaction_filter=1 --enable_pipelined_write=0 --expected_values_dir= --experimental_mempurge_threshold=0.3731507821551916 --fail_if_options_file_error=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=8 --index_type=0 --iterpercent=0 --key_len_percent_dist=2,16,10,3,40,29 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=1 --mark_for_compaction_one_file_in=10 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1048576 --max_key_len=6 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_prefix_bloom_size_ratio=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_blob_size=8 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=0 --num_iterations=77 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=39 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=0 --ribbon_starting_level=6 --secondary_cache_fault_one_in=32 --seed=3069121425 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync=False --sync_fault_injection=False --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=3 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=False --use_merge=1 --use_multiget=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=True --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=0 --writepercent=48
Pretty sure that it's unrelated to the change in this issue because you have --clear_column_family_one_in=4
with --column_families=1
, and in that case no drop is actually performed, so the code changes of this issue aren't reached. I'll try to reproduce locally on this branch and on main
.
@Yuval-Ariel - please run full cycle as it seams to be unrelated to this change
QA passed on c6567a63f624c7d0a00e8461f031ef80528b8f6e
When installing memtable flush results, if the flush was unsuccessful (due to an error during the flush or the manifest write) or if the CF was dropped, the memtable is rolled back into a flushable state. However, this is unnecessary since it reactivates the pending flush flag on the relevant CF even though no flush can take place after CF drop, and as in #126 it also causes an issue for the upcoming improvements to the WriteBufferManager immutable memory tracking (#113) because it marks the memtable memory as ready for flush again for a short period before finally getting freed when the memtable is dropped during the destruction of the dropped CF (and this might upset the WBM behaviour because in the future we plan to rely on the amount of memory that isn't already being flushed for triggering flushes rather than on the amount of mutable memory alone as is done today).
This is a continuation of #126 (I simply missed this rollback), and since the rollback only marks the memtable as ready for flush again and clears related flush state, there's no harm in skipping it and just letting the memtable drop in that state.