Closed ofriedma closed 9 months ago
1) run with the flag kill_random_test=0 this will prevent db_stress from killing the test and it will failed constantly 2) i used ./ldb scan --from="0x0000000000001080" --to="0x0000000000001100" --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno/ --hex --no_value to get a full list of all the keys in the range ... the expected key is in the database
the relevant dump is: "0x0000000000001089 0x000000000000108A 0x000000000000108B 0x000000000000108C 0x000000000000108D 0x000000000000108F 0x0000000000001092 0x0000000000001093 0x0000000000001094 0x0000000000001097 0x0000000000001098 0x00000000000010A3 0x00000000000010A4 0x00000000000010A6"
based on the database dump, SFP 00000000000010A0 should return 0000000000001098 and now "NNPPPNNPPPP" (N=next P = Prev) should return 0000000000001093 (4 next and 7 prev = 3 prev ). need to check my logic and if items were added after the iterator started. @Yuval-Ariel where is the patch that enable output of a trace file ...
Opening the database again pass the verification phase. hence the problem is probably on deciding which entries are part of the iterator.
@hilikspdb 6ba24411c5e530b4f9346f7fd913aad9e972efc3 its on by default and the trace files are stored in the expected_values_dir
thanks , anyway the second test fail even when it starts on an empty database .
I have simplified the test: rm -rf /tmp/rocksdb/; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expectedb1cqsg8 --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=skip_list --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=35 --delpercent=28 --delrangepercent=4 --prefixpercent=0
this test pass ok with memtable=skip-list and in 2.6 while crushes in main with memtablerep=hash_spdb ... therefore we left we two options : bug in translation or a change in the behaviour of db_stress
here is even simpler version: rm -rf /tmp/rocksdb/; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expectedb1cqsg8 --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=hash_spdb --enable_speedb_features=false --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=67 --delpercent=0 --delrangepercent=0 --prefixpercent=0
This commit : 6213b6342948f18e7a17201e858bdcf9cccf1490 seems to introduce the issue
This commit : 6213b63 seems to introduce the issue
@ofriedma - that is a commit that was made before version 2.3.0?
this commit looks a force push commit that was done to the plugin which was remove later on. any way it looks that changes were done to my commit which i didnt do and forced push. also there is an issue with db stress and enable speedb features as i mentioned in the past since the shared options doesnt support hardening. currently there is a wrong way how the db stress handles it. i will attach a branch that handles both cases and you can proceed from there
For 2.7 the error does not appear, so it has been revealed somewhere between 2.7 and main.
@ofriedma - Are you referring to the rebased branch or to the official 2.7? And please clarify: You have said that you have identified the commit that is causing this, but this commit was made before version 2.3. I am confused.
official 2.7, It appears that it happened probably after the rebase, before v2.7 no errors(and the commit was there).
@ayulas fixs are at branch /ayula/fix_bug @ofriedma @hilikspdb pls review them
@erez-speedb This issue is not related to db_stress enable_speedb_features becuase we can reproduce it without this feature entirely.
Here is a quick description of the bug and what can be done:
On Thu, Jan 18, 2024 at 11:05 AM hilikspdb @.***> wrote:
Here is a quick description of the bug and what can be done:
- write starts at time t and finishes at t+n
- iterators starts at time t+k where k < n
- seek is done at time t+x
- db_stress internal locking mechanism prevents the SEEK to start before the end of the write.
- The skip list iterator uses the mutable copy and the iterator sequence number (so data that were added to the memtable after the new iterator is in)
vector and hash-spdb are using an immutable copy (so data is not added) NOTE: this is not a bug in the memtable the definition of what to do with data in flight is not clear . There are two options to fix it:
- fix db_bench so new iterator will also use lock of the region that it is doing seek on
- mimic skip-list behaviour add the last vector in the time of seek and use the sequence number to decide if data is in or not.
— Reply to this email directly, view it on GitHub https://github.com/speedb-io/speedb/issues/802#issuecomment-1898068645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZKU2DG7H3LWDLBSI3T3L73YPDQUDAVCNFSM6AAAAABBM5KDQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGA3DQNRUGU . You are receiving this because you commented.Message ID: @.***>
the issue is a result of the Refresh iterator that by definition is not suppose to be supported in the immutable iterator. (hash_spdb, vectors memtablerep) i added the supoorted flag to the memtablerep and if it is false we avoid doing refresh in no_batched_ops_stress test @hilikspdb @ofriedma pls review ayula/802-refresh-iter
@ofriedma as we discussed i expose the allow refresh pls review cc249df6f20ef48b3aaae6d668e9d8fbb5a7c424
How to reproduce:
mkdir /tmp/rocksdb_crashtest_expectedb1cqsg8 run this command: ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_bits=5.680994457101304 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=0 --delpercent=17 --delrangepercent=5 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=1 --expected_values_dir=/tmp/rocksdb_crashtest_expectedb1cqsg8 --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=6 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=5 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --initiate_wbm_flushes=1 --iterpercent=17 --key_len_percent_dist=100 --kill_random_test=888887 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=4 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=0 --num_iterations=60 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=scoped --prefix_size=1 --prefixpercent=6 --prepopulate_block_cache=0 --preserve_internal_time_seconds=36000 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=20 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=70 --stats_dump_period_sec=10 --subcompactions=2 --sync=0 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --total_ram_size=1073741824 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35
run this command:
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_bits=2 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=2097151 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=1048576 --delpercent=28 --delrangepercent=4 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=0 --expected_values_dir=/tmp/rocksdb_crashtest_expectedb1cqsg8 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --initiate_wbm_flushes=0 --iterpercent=3 --key_len_percent_dist=100 --kill_exclude_prefixes=WritableFileWriter::Append,WritableFileWriter::WriteBuffered --kill_random_test=88889 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=8 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=2 --num_iterations=19 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=default --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=30 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=0 --stats_dump_period_sec=600 --subcompactions=3 --sync=1 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --total_ram_size=536870912 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35
You should get the error similar to this:
2023-12-28T14:37:27.4899757Z Expected state has key 0000000000005982, iterator is at key 0000000000005966 2023-12-28T14:37:27.4901103Z Column family: default, op_logs: S 000000000000597D N SFP 000000000000598F P S 0000000000005986 NPNPP 2023-12-28T14:37:27.4902028Z Verification failed :(