Closed aravind-wdc closed 2 years ago
@aravind-wdc : could add steps for how to reproduce this issue?
Above described scenario is the cause for application hangs/corruption, zenfs utility manifests it as a hang, where as in myrocks, it manifests as a corruption. This can be reproduced as below. Running Sysbench prepare on MyRocks with below configuration (Note target_file_size_base=32mb) and creating a database with 500 million rows per table and 16 tables, will create this issue. Below is the needed .cnf file.
[mysqld]
rocksdb_max_row_locks=1000M
plugin-load-add=rocksdb=ha_rocksdb.so
skip-innodb
rocksdb
default-storage-engine=rocksdb
default-tmp-storage-engine=MyISAM
ssl=0
skip-log-bin
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
# files
max_prepared_stmt_count=1000000
rocksdb_max_open_files=-1
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16384
rocksdb_table_cache_numshardbits=6
# rate limiter
rocksdb_bytes_per_sync=16777216
rocksdb_wal_bytes_per_sync=4194304
#rocksdb_rate_limiter_bytes_per_sec=104857600 #100MB/s
#
# # triggering compaction if there are many sequential deletes
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000
rocksdb_default_cf_options="write_buffer_size=512m;target_file_size_base=32m;max_bytes_for_level_base=1024m;max_write_buffer_number=4;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;max_write_buffer_number=4;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0"
rocksdb_max_subcompactions=4
rocksdb_compaction_readahead_size=16m
rocksdb_use_direct_reads=ON
rocksdb_use_direct_io_for_flush_and_compaction=ON
[mysqld_safe]
thp-setting=never
If this issue is hit, mysql fails to start with a corruption error message:
[ERROR] [MY-000000] [Server] Plugin rocksdb reported: 'Error opening instance, Status Code: 2, Status: Corruption: file is too short (27556940 bytes) to be an sstable./.rocksdb/031176.sst'
But zenfs list command will list it as a valid file.
$ sudo ./zenfs list --zbd=nvme0n1 --path=./.rocksdb | grep 031176
27556940 Sep 23 2021 03:06:23 031176.sst
When we dump the metadata information, in the resulting json file, we can see this:
{"id":91857,"filename":"./.rocksdb/031176.sst","size":27556940,"hint":0,"extents":[ ]}
Here the file 031176.sst has a size of 27556940, but no extents are associated with it.
Even zenfs backup hangs for this file,
$ time sudo ./zenfs backup --zbd=nvme0n1 --path=/tmp/ --backup_path=./.rocksdb/031176.sst
./.rocksdb/031176.sst
^C
real 0m13.051s
user 0m13.007s
sys 0m0.019s
@aravind-wdc : What i think is going on here is that file size metadata gets persisted in a snapshot but the file has not been synced by the user before the crash (thus no extents has been stored). This is an inconsistency we could resolve by summing up the extents after mount and reducing the file size to the sum of the stored extents(0 in this case).
I'm planning to add active-extent tracking in the metadata, which would enable us to recover up to the zone write pointer of the active extent's zone, but that will go in metadata format v2
As we can't reproduce this, let's detect and fix up file system inconsistencies when we can.
Created https://github.com/westerndigitalcorporation/zenfs/issues/95 to solve the file size inconsistency
"Zenfs list" command shows that a certain file is present within the file system. But backup command on that file would hang for that file. On dumping the metadata, it was showing that for that particular file, there was an entry in the file list, but the file did not have any valid extents associated with it. This issue can manifest as hangs or corruption issues for applications. Creating this github-issue to keep track of it.