tikv / rust-rocksdb

rust wrapper for rocksdb
Apache License 2.0
276 stars 155 forks source link

CompressionType #152

Open zengqingfu1442 opened 6 years ago

zengqingfu1442 commented 6 years ago

How add a new compression type such as gzip or openssl for rust-rocksdb?And then how to build it for the TiKV storage engine?Thanks.

zengqingfu1442 commented 6 years ago

I'm not sure if the disk have any error. I don't add a TIKV.

zengqingfu1442 commented 6 years ago

I just started the cluster and after soon, one of the tikv instance is down.

zengqingfu1442 commented 6 years ago

What can result in TiKV panic?

siddontang commented 6 years ago

failed to apply snap: Other(StringError("Corruption: external file have corrupted keys"))

The panic log shows clearly that the SST file has a corrupted key. The SST file is generated by RocksDB SSTWriter, because we will check the SST file CRC32 before ingesting it, so I guess the key has already been corrupted when we generate the SST file, because you use your own compress library, I also think maybe there is something wrong when we use the compress lib to generate the SST.

Maybe you can add a simple test like https://github.com/pingcap/rust-rocksdb/blob/master/tests/test_ingest_external_file.rs#L97 with your own compress lib to verify it.

zengqingfu1442 commented 6 years ago

I tried to put tikv on other disks and it's the same errors. So there is no disk error.

zengqingfu1442 commented 6 years ago

But I tried the default compression [no,no,lz4,lz4,lz4,zstd,zstd], it's also the same error.

zengqingfu1442 commented 6 years ago

dmesg | grep -i OOM* and there is no OOM information.

siddontang commented 6 years ago

You can use RocksDB tool sst_dump to dump the wrong SST to show which compression it uses. This is not relevant with your default compression configuration, but with the compression type which you used before to generate the SST.

/cc @huachaohuang

zengqingfu1442 commented 6 years ago

Thanks, I will try it.

zengqingfu1442 commented 6 years ago

After the tidb cluster start up, before I run sysbench benchmark, one of the tikv instances is down. It seems that the tow sst file are OK. [tcn@sfx-008 rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_3/data/db --command=check from [] to [] Process /mnt/sfx-card-root/tikv1_3/data/db/000012.sst Sst file format: block-based Process /mnt/sfx-card-root/tikv1_3/data/db/000014.sst Sst file format: block-based

[tcn@sfx-008 rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_3/data/db --show_compression_sizes --command=check --verify_checksum from [] to [] Process /mnt/sfx-card-root/tikv1_3/data/db/000012.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 852 Compression: kSnappyCompression Size: 845 Compression: kZlibCompression Size: 827 Compression: kCSSZlibCompression Size: 846 Unsupported compression type: kBZip2Compression. Unsupported compression type: kLZ4Compression. Unsupported compression type: kLZ4HCCompression. Unsupported compression type: kXpressCompression. Unsupported compression type: kZSTD.

siddontang commented 6 years ago

can you use sst_dump to dump all KVs?

zengqingfu1442 commented 6 years ago

Yes, [tcn@sfx-008 rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_3/data/db --command=scan --output_hex from [] to [] Process /mnt/sfx-card-root/tikv1_3/data/db/000012.sst Sst file format: block-based '0101' seq:1, type:1 => 08F3A4F6879DBFC68E5A1001 '0102' seq:5, type:0 => Process /mnt/sfx-card-root/tikv1_3/data/db/000014.sst Sst file format: block-based '0102000000000000000403' seq:7, type:0 => '0102000000000000000503' seq:10, type:1 => 08AF01120508AF011006 '0102000000000000000504' seq:9, type:1 => 0A070806100018AF0110AF01 '0103000000000000000401' seq:6, type:0 => '0103000000000000000501' seq:8, type:1 => 08011218080512001A002204080210012A04080810022A04080A1001

siddontang commented 6 years ago

@zengqingfu1442

/mnt/sfx-card-root/tikv1_3/data/db only contains the valid SST files, you should check /mnt/sfx-card-root/tikv1_3/data/snap instead.

zengqingfu1442 commented 6 years ago

Now, the compression-per-level is [no,no,z4,lz4,lz4,zstd,zstd]:

[tcn@sfx-008 sfx-card-root]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_3/data/snap --command=scan --output_hex from [] to [] Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_write.sst Sst file format: block-based /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_write.sst: Corruption: LZ4 not supported or corrupted LZ4 compressed block contents Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_default.sst Sst file format: block-based /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_default.sst: Corruption: LZ4 not supported or corrupted LZ4 compressed block contents Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_lock.sst /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_lock.sst: Corruption: file is too short (1 bytes) to be an sstable: /mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_lock.sst

zengqingfu1442 commented 6 years ago

But after deployment and start, the tikv-server binary on the machine which is deployed with 3 tikv instances can linked to the liblz4.so :

[tcn@sfx-008 bin]$ ls node_exporter tikv-server [tcn@sfx-008 bin]$ ldd tikv-server linux-vdso.so.1 => (0x00007fff7f2f5000) librocksdb.so.5.7 => /lib64/librocksdb.so.5.7 (0x00007f1b96db2000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1b96a36000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f1b96832000) librt.so.1 => /lib64/librt.so.1 (0x00007f1b9662a000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1b9640d000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1b961f7000) libc.so.6 => /lib64/libc.so.6 (0x00007f1b95e36000) /lib64/ld-linux-x86-64.so.2 (0x00007f1b98586000) libm.so.6 => /lib64/libm.so.6 (0x00007f1b95b33000) libsnappy.so.1 => /lib64/libsnappy.so.1 (0x00007f1b9592d000) libgflags.so.2.1 => /lib64/libgflags.so.2.1 (0x00007f1b9570c000) libz.so.1 => /lib64/libz.so.1 (0x00007f1b954f5000) libcssz.so => /lib64/libcssz.so (0x00007f1b9524d000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f1b9503d000) liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f1b94e29000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f1b94bbe000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f1b949b2000) libtbb.so.2 => /lib64/libtbb.so.2 (0x00007f1b9477c000)

siddontang commented 6 years ago

/mnt/sfx-card-root/tikv1_3/data/snap/rev_5_6_175_write.sst: Corruption: LZ4 not supported or corrupted LZ4 compressed block contents

Seem that the SST file is corrupted, I still doubt that is caused by your compression library because we use LZ4 and zstd for a long time and they both work well.

What is the origin compression level in the TiKV? Do you use the same name like lz4 but in fact use your own compression library?

zengqingfu1442 commented 6 years ago

1.The CSSZlib compression has been used in hadoop, hbase, spark mysql and mongodb so on, it works well.

  1. I changed the default compression-per-level from [no,no,lz4,lz4,lz4,zstd,zstd] to [no,no,csszlib,csszlib,csszlib,csszlib,csszlib] in tikv/src/config.rs.
  2. The CSSZlib is completely different from Lz4, and the CSSZlib compression library is similar to Zlib.
zengqingfu1442 commented 6 years ago

I found the modified rocksdb used in tikv can't support lz4 compression,

[dzeng@dzeng rocksdb-master]$ ./db_sanity_test /mnt/sfx-card-root/rocksdb570/ create Creating... Basic -- OK SpecialComparator -- OK ZlibCompression -- OK ZlibCompressionVersion2 -- OK CSSZlibCompression -- OK CSSZlibCompressionVersion2 -- OK LZ4Compression -- Corruption: LZ4 not supported or corrupted LZ4 compressed block contents FAIL LZ4HCCompression -- Corruption: LZ4HC not supported or corrupted LZ4HC compressed block contents FAIL ZSTDCompression -- OK PlainTable -- OK BloomFilter -- OK

zengqingfu1442 commented 6 years ago

But the log of rocksdb in /mnt/sfx-card-root/tikv1_3/data/db/LOG show that the tikv support it. ... 2017/11/29-15:57:02.651205 7f76e2b62e40 Compression algorithms supported: 2017/11/29-15:57:02.651206 7f76e2b62e40 Snappy supported: 1 2017/11/29-15:57:02.651207 7f76e2b62e40 Zlib supported: 1 2017/11/29-15:57:02.651208 7f76e2b62e40 Bzip supported: 1 2017/11/29-15:57:02.651210 7f76e2b62e40 LZ4 supported: 1 2017/11/29-15:57:02.651216 7f76e2b62e40 ZSTD supported: 1 2017/11/29-15:57:02.651220 7f76e2b62e40 Fast CRC32 supported: 0 .... ...

siddontang commented 6 years ago

@zengqingfu1442

How do you build the LZ4?

I think supporting LZ4 only means that RocksDB has already linked LZ4 and can use it, but it doesn't know whether this lib can work well or not.

BusyJay commented 6 years ago

Did you build db_sanity_test or sst_dump with lz4 enabled?

zengqingfu1442 commented 6 years ago

The rocksdb5.7.0 I used for tikv didn't support lz4. In Beijing local TiDB cluster, it work well though it doesn't support lz4 because I don't use it.

[dzeng@dzeng rocksdb-master]$ ./db_sanity_test /mnt/sfx-card-root/rocksdb570/ create Creating... Basic -- OK SpecialComparator -- OK ZlibCompression -- OK ZlibCompressionVersion2 -- OK CSSZlibCompression -- OK CSSZlibCompressionVersion2 -- OK LZ4Compression -- Corruption: LZ4 not supported or corrupted LZ4 compressed block contents FAIL LZ4HCCompression -- Corruption: LZ4HC not supported or corrupted LZ4HC compressed block contents FAIL ZSTDCompression -- OK PlainTable -- OK BloomFilter -- OK

[dzeng@centos7-2-cdh rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_1/data/raft --command=check --show_compression_sizes from [] to [] Process /mnt/sfx-card-root/tikv1_1/data/raft/001682.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 26299543 Compression: kSnappyCompression Size: 5771933 Compression: kZlibCompression Size: 3617874 mypid = 19772, dev_name = /dev/sfx0, dev_num = 0 Compression: kCSSZlibCompression Size: 4536131 Compression: kBZip2Compression Size: 2808291 Unsupported compression type: kLZ4Compression. Unsupported compression type: kLZ4HCCompression. Unsupported compression type: kXpressCompression. Unsupported compression type: kZSTD.

zengqingfu1442 commented 6 years ago

I just yum install lz4-devel before "make release" rocksdb.

siddontang commented 6 years ago

@zengqingfu1442

Because we don't use your compression lib before, so we can't know whether this is caused by your compiled RocksDB or not. I think we can start a new TiKV cluster which disables all compression at first and then check whether it can panic or not.

If not, you can all use LZ4 instead to check again. And at last, you can use your own compression library.

Every check should clean up all data and use a new cluster.

zengqingfu1442 commented 6 years ago

I destroyed the cluster and set the compression-per-level as [no,no,no,no,no,no,no], and then I start it , after a while, one of the tikv instance is still down. The tikv.log shows the same error.

[tcn@sfx-008 rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_3/data/snap --command=scan --output_hex from [] to [] Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_default.sst Sst file format: block-based /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_default.sst: Corruption: LZ4 not supported or corrupted LZ4 compressed block contents Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_lock.sst /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_lock.sst: Corruption: file is too short (1 bytes) to be an sstable: /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_lock.sst Process /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_write.sst Sst file format: block-based /mnt/sfx-card-root/tikv1_3/data/snap/rev_3_6_200_write.sst: Corruption: LZ4 not supported or corrupted LZ4 compressed block contents

[tcn@sfx-008 rocksdb-master]$ ./db_sanity_test /mnt/sfx-card-root/rocksdb570/ create Creating... Basic -- OK SpecialComparator -- OK ZlibCompression -- OK ZlibCompressionVersion2 -- OK CSSZlibCompression -- OK CSSZlibCompressionVersion2 -- OK LZ4Compression -- Corruption: LZ4 not supported or corrupted LZ4 compressed block contents FAIL LZ4HCCompression -- Corruption: LZ4HC not supported or corrupted LZ4HC compressed block contents FAIL ZSTDCompression -- OK PlainTable -- OK BloomFilter -- OK

siddontang commented 6 years ago

@huachaohuang

Why do we still use the LZ4 here?

zengqingfu1442 commented 6 years ago

No, I set the compression-per-level as [no,no,no,no,no,no,no]. But due to the rocksdb I use doesn't support LZ4, so when using sst_dump to analyze the sst files, it still raise such error.

siddontang commented 6 years ago

@zengqingfu1442

Can you run cargo test directly to check whether your RocksDB can work well?

zengqingfu1442 commented 6 years ago

I deployed the original tidb cluster successfully and ran sysbench benchmark. Then I replaced the tidb-ansible/resources/bin/tikv- with my own tikv- binary, and then rolling_update, but the tikv can't start up, the tikv.log said "2017/11/30 15:30:44.985 tikv-server.rs:230: [ERROR] failed to start node: RaftServer(RocksDb("Corruption: LZ4 not supported or corrupted LZ4 compressed block contents"))". But I didn't use lz4 compression in tikv.yml. So it seems that when starting tikv, it will check that if the rocksdb support LZ4 compression, if not, tikv won't start up. BTW,my tikv links to librocksdb.so dynamically.

zengqingfu1442 commented 6 years ago

Cargo test inrust-rocksdb?

[dzeng@dzeng rust-rocksdb]$ make test Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs Running target/debug/deps/rocksdb-7707f1ef9082226d

running 18 tests test rocksdb::test::snapshot_test ... ok No 0 Snappy 1 Zlib 2 CSSZlib 10 ZstdNotFinal 64 Lz4hc 5 Bz2 3 Lz4 4 Zstd 7 test rocksdb::test::test_supported_compression ... ok retrieved utf8 value: abcdefgh test merge_operator::test::mergetest ... ok Hello k1: v1111 Hello k2: v2222 Hello k3: v3333 test rocksdb::test::log_dir_test ... ok test rocksdb::test::errors_do_stuff ... ok test rocksdb::test::external ... ok test rocksdb::test::iterator_test ... ok test rocksdb::test::test_get_approximate_memtable_stats ... ok test rocksdb::test::single_delete_test ... ok test rocksdb::test::writebatch_works ... ok test rocksdb::test::test_pause_bg_work ... ok test rocksdb::test::list_column_families_test ... ok test rocksdb::test::test_get_all_key_versions ... ok test rocksdb::test::block_cache_usage ... ok test rocksdb::test::backup_db_test ... ok test rocksdb::test::approximate_size_test ... ok test rocksdb::test::flush_cf ... ok test rocksdb::test::property_test ... ok

test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running target/debug/deps/rocksdb-d816a6ab89f53acf

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running target/debug/deps/test-c3d4eb64fe9a3ee5

running 82 tests test test_delete_range::test_delete_range ... ok cf1 created successfully successfully opened db with column family test test_delete_range::test_delete_range_ingest_file ... ok test test_delete_range::test_delete_range_case_2 ... ok test test_delete_range::test_delete_range_prefix_bloom_case_2 ... ok successfully opened db with column family test test_compaction_filter::test_compaction_filter ... ok cf1 successfully dropped. test test_column_family::test_column_family ... ok test test_delete_range::test_delete_range_case_1 ... ok test test_delete_range::test_delete_range_prefix_bloom_case_1 ... ok test test_delete_range::test_delete_range_case_4 ... ok test test_event_listener::test_event_listener_ingestion ... ok test test_delete_range::test_delete_range_case_6 ... ok test test_delete_range::test_delete_range_compact ... ok test test_delete_range::test_delete_range_case_5 ... ok test test_ingest_external_file::test_ingest_external_file ... ok test test_delete_range::test_delete_range_case_3 ... ok test test_ingest_external_file::test_ingest_external_file_new ... ok test test_iterator::read_with_upper_bound ... ok test test_compact_range::test_compact_range ... ok test test_ingest_external_file::test_ingest_external_file_new_cf ... ok test test_iterator::test_iterator ... ok test test_iterator::test_seek_for_prev ... ok test test_rate_limiter::test_rate_limiter ... ok test test_rate_limiter::test_rate_limiter_sendable ... ok test test_delete_range::test_delete_range_prefix_bloom_case_3 ... ok test test_read_only::test_open_for_read_only ... ok test test_delete_range::test_delete_range_prefix_bloom_case_4 ... ok test test_rocksdb_options::test_allow_concurrent_memtable_write ... ok test test_multithreaded::test_multithreaded ... ok test test_delete_range::test_delete_range_prefix_bloom_case_5 ... ok test test_rocksdb_options::test_clone_options ... ok test test_rocksdb_options::test_auto_roll_max_size_info_log ... ok test test_rocksdb_options::test_bottommost_compression ... ok test test_read_only::test_open_cf_for_read_only ... ok test test_delete_range::test_delete_range_prefix_bloom_case_6 ... ok test test_rocksdb_options::test_compaction_readahead_size ... ok test test_rocksdb_options::test_db_paths ... ok test test_rocksdb_options::test_enable_statistics ... ok test test_rocksdb_options::test_direct_read_write ... ok test test_rocksdb_options::test_enable_pipelined_write ... ok test test_rocksdb_options::test_fifo_compaction_options ... ok test test_rocksdb_options::test_get_compression ... ok test test_rocksdb_options::test_get_compression_per_level ... ok test test_rocksdb_options::test_flush_wal ... ok test test_delete_range::test_delete_range_prefix_bloom_compact_case ... ok test test_rocksdb_options::test_log_file_opt ... ok test test_rocksdb_options::test_manual_wal_flush ... ok test test_delete_files_in_range::test_delete_files_in_range_with_snap ... ok test test_rocksdb_options::test_memtable_insert_hint_prefix_extractor ... ok test test_iterator::test_fixed_suffix_seek ... ok test test_rocksdb_options::test_pending_compaction_bytes_limit ... ok test test_rocksdb_options::test_set_bytes_per_sync ... ok test test_rocksdb_options::test_read_options ... ok test test_rocksdb_options::test_set_cache_index_and_filter_blocks_with_high_priority ... ok test test_delete_files_in_range::test_delete_files_in_range_with_iter ... ok test test_rocksdb_options::test_set_max_manifest_file_size ... ok test test_rocksdb_options::test_set_compaction_pri ... ok test test_rocksdb_options::test_set_delayed_write_rate ... ok test test_rocksdb_options::test_set_level_compaction_dynamic_level_bytes ... ok test test_rocksdb_options::test_set_lru_cache ... ok test test_rocksdb_options::test_set_max_background_jobs ... ok test test_rocksdb_options::test_set_max_subcompactions ... ok test test_rocksdb_options::test_set_num_levels ... ok test test_rocksdb_options::test_set_optimize_filters_for_hits ... ok test test_rocksdb_options::test_set_pin_l0_filter_and_index_blocks_in_cache ... ok test test_rocksdb_options::test_set_ratelimiter ... ok test test_rocksdb_options::test_set_wal_opt ... ok test test_iterator::test_send_iterator ... ok test test_rocksdb_options::test_sync_wal ... ok test test_rocksdb_options::test_writable_file_max_buffer_size ... ok test test_slice_transform::test_slice_transform ... ok test test_rocksdb_options::test_block_based_options ... ok test test_rocksdb_options::test_compact_options ... ok test test_delete_range::test_delete_range_sst_files ... ok test test_rocksdb_options::test_get_block_cache_usage ... ok test test_event_listener::test_event_listener_basic ... ok test test_iterator::test_total_order_seek ... ok test test_prefix_extractor::test_prefix_extractor_compatibility ... ok test test_rocksdb_options::test_write_options ... ok test test_statistics::test_db_statistics ... ok test test_ingest_external_file::test_ingest_simulate_real_world ... ok test test_table_properties::test_table_properties_collector_factory ... ok test test_rocksdb_options::test_create_info_log ... ok

test result: ok. 82 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Doc-tests rocksdb

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

siddontang commented 6 years ago

PTAL @huachaohuang

huachaohuang commented 6 years ago

@zengqingfu1442 Hi, I see a long conversation here, try to figure out what happened, can you summarize the situation here for me? And I see that you are using some modified version of tikv/rust-rocksdb/rocksdb, it will be more efficient if you can show us your changes, so we can review them easily, thanks.

zengqingfu1442 commented 6 years ago

I integrated a new compression named CSSZlib into rocksdb and tikv. I built the tikv dynamically linked to librocksdb.so. But the rocksdb doesn't support LZ4 compression, so when I replaced the official tikv binary in tidb-ansible/resources/bin with my built tikv binary and rolling_update, the tikv instances can't start up, and the tikv.log says "2017/11/30 18:52:47.982 tikv-server.rs:230: [ERROR] failed to start node: RaftServer(RocksDb("Corruption: LZ4 not supported or corrupted LZ4 compressed block contents"))". BTW,there isn't any messages in tikvstderr.log.**** Thanks.

huachaohuang commented 6 years ago

You mean your modified rocksdb doesn't support LZ4, right? When you first bootstrap the cluster with the official tikv, does that tikv use LZ4 compression?

I guess this can happen if:

  1. Bootstrap a new cluster with the official tikv using LZ4 compression
  2. The official tikv writes some data with LZ4 compression
  3. Replace the official tikv with the modified tikv and restart
  4. The modified tikv try to read data with LZ4 compression written by the official tikv, but it failed since the modified tikv doesn't support LZ4 compression

Am I missing something?

zengqingfu1442 commented 6 years ago

Yes, you're right. However,I tried unsafe_cleanup the official tidb cluster and then replaced tikv binary, deployed, started, the tikv instance will be down after soon.

Part of the tikv.log: .... .... 2017/11/30 20:24:29.714 tikv-server.rs:122: [INFO] start prometheus client 2017/11/30 20:24:29.714 mod.rs:209: [INFO] starting working thread: split check worker 2017/11/30 20:24:29.715 tikv-server.rs:234: [INFO] start storage 2017/11/30 20:24:29.716 mod.rs:209: [INFO] starting working thread: snapshot worker 2017/11/30 20:24:29.717 mod.rs:209: [INFO] starting working thread: raft gc worker 2017/11/30 20:24:29.718 mod.rs:209: [INFO] starting working thread: compact worker 2017/11/30 20:24:29.724 future.rs:115: [INFO] starting working thread: pd worker 2017/11/30 20:24:29.724 region.rs:142: [INFO] [region 4] begin apply snap data 2017/11/30 20:24:29.724 mod.rs:209: [INFO] starting working thread: consistency check worker 2017/11/30 20:24:29.726 mod.rs:209: [INFO] starting working thread: apply worker 2017/11/30 20:24:29.739 region.rs:237: [ERROR] failed to apply snap: Other(StringError("Corruption: external file have corrupted keys"))!!! 2017/11/30 20:24:29.755 mod.rs:209: [INFO] starting working thread: end-point-worker 2017/11/30 20:24:29.757 mod.rs:209: [INFO] starting working thread: snap-handler 2017/11/30 20:24:29.762 server.rs:155: [INFO] TiKV is ready to serve 2017/11/30 20:24:30.936 panic_hook.rs:99: [ERROR] thread 'raftstore-2' panicked '[region 4] 8 applying snapshot failed' at "src/raftstore/store/peer_storage.rs:1009" stack backtrace: 0: 0x7f5df786f95e - backtrace::backtrace::libunwind::trace at /home/dzeng/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/libunwind.rs:54

zengqingfu1442 commented 6 years ago

sst_dump messages:

[tcn@sfx-008 rocksdb-master]$ ./sst_dump --file=/mnt/sfx-card-root/tikv1_1/data/snap --show_compression_sizes from [] to [] Process /mnt/sfx-card-root/tikv1_1/data/snap/rev_4_6_141_write.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 788 Compression: kSnappyCompression Size: 781 Compression: kZlibCompression Size: 779 Compression: kCSSZlibCompression Size: 782 Compression: kBZip2Compression Size: 780 Error in `./sst_dump': free(): invalid next size (fast): 0x00000000023632b0 ======= Backtrace: ========= /usr/lib64/libc.so.6(+0x7c503)[0x7f3c686a7503] /usr/lib64/libstdc++.so.6(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_mutateEmmPKcm+0xed)[0x7f3c6901c6ad] /usr/lib64/libstdc++.so.6(_ZNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEE14_M_replace_auxEmmmc+0xc8)[0x7f3c6901ce08] ./sst_dump[0x645bd6] ./sst_dump[0x646897] ./sst_dump[0x64846d] ./sst_dump[0x78a34e] ./sst_dump[0x78b141] ./sst_dump[0x78e5db] ./sst_dump[0x40a636] /usr/lib64/libc.so.6(libc_start_main+0xf5)[0x7f3c6864cb35] ./sst_dump[0x4d102e] ======= Memory map: ======== 00400000-0083a000 r-xp 00000000 103:00 10225499 /nvme0n1/RocksDB-5.7.0/rocksdb-master/sst_dump 00a3a000-00a3b000 r--p 0043a000 103:00 10225499 /nvme0n1/RocksDB-5.7.0/rocksdb-master/sst_dump 00a3b000-00a3c000 rw-p 0043b000 103:00 10225499 /nvme0n1/RocksDB-5.7.0/rocksdb-master/sst_dump 00a3c000-00a4c000 rw-p 00000000 00:00 0 02158000-0239e000 rw-p 00000000 00:00 0 [heap] 7f3c64000000-7f3c64021000 rw-p 00000000 00:00 0 7f3c64021000-7f3c68000000 ---p 00000000 00:00 0 7f3c6862b000-7f3c687e1000 r-xp 00000000 fd:00 201330474 /usr/lib64/libc-2.17.so 7f3c687e1000-7f3c689e1000 ---p 001b6000 fd:00 201330474 /usr/lib64/libc-2.17.so 7f3c689e1000-7f3c689e5000 r--p 001b6000 fd:00 201330474 /usr/lib64/libc-2.17.so 7f3c689e5000-7f3c689e7000 rw-p 001ba000 fd:00 201330474 /usr/lib64/libc-2.17.so 7f3c689e7000-7f3c689ec000 rw-p 00000000 00:00 0 7f3c689ec000-7f3c68a01000 r-xp 00000000 fd:00 207220294 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f3c68a01000-7f3c68c00000 ---p 00015000 fd:00 207220294 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f3c68c00000-7f3c68c01000 r--p 00014000 fd:00 207220294 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f3c68c01000-7f3c68c02000 rw-p 00015000 fd:00 207220294 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f3c68c02000-7f3c68d02000 r-xp 00000000 fd:00 201330482 /usr/lib64/libm-2.17.so 7f3c68d02000-7f3c68f02000 ---p 00100000 fd:00 201330482 /usr/lib64/libm-2.17.so 7f3c68f02000-7f3c68f03000 r--p 00100000 fd:00 201330482 /usr/lib64/libm-2.17.so 7f3c68f03000-7f3c68f04000 rw-p 00101000 fd:00 201330482 /usr/lib64/libm-2.17.so 7f3c68f04000-7f3c6906f000 r-xp 00000000 fd:00 220075613 /usr/lib64/libstdc++.so.6.0.21 7f3c6906f000-7f3c6926f000 ---p 0016b000 fd:00 220075613 /usr/lib64/libstdc++.so.6.0.21 7f3c6926f000-7f3c69279000 r--p 0016b000 fd:00 220075613 /usr/lib64/libstdc++.so.6.0.21 7f3c69279000-7f3c6927b000 rw-p 00175000 fd:00 220075613 /usr/lib64/libstdc++.so.6.0.21 7f3c6927b000-7f3c6927f000 rw-p 00000000 00:00 0 7f3c6927f000-7f3c692e9000 r-xp 00000000 fd:00 214257086 /usr/lib64/libzstd.so.1.3.2 7f3c692e9000-7f3c694e8000 ---p 0006a000 fd:00 214257086 /usr/lib64/libzstd.so.1.3.2 7f3c694e8000-7f3c694e9000 r--p 00069000 fd:00 214257086 /usr/lib64/libzstd.so.1.3.2 7f3c694e9000-7f3c694ea000 rw-p 0006a000 fd:00 214257086 /usr/lib64/libzstd.so.1.3.2 7f3c694ea000-7f3c694fc000 r-xp 00000000 fd:00 214257074 /usr/lib64/liblz4.so.1.7.3 7f3c694fc000-7f3c696fb000 ---p 00012000 fd:00 214257074 /usr/lib64/liblz4.so.1.7.3 7f3c696fb000-7f3c696fc000 r--p 00011000 fd:00 214257074 /usr/lib64/liblz4.so.1.7.3 7f3c696fc000-7f3c696fd000 rw-p 00012000 fd:00 214257074 /usr/lib64/liblz4.so.1.7.3 7f3c696fd000-7f3c6970c000 r-xp 00000000 fd:00 201330716 /usr/lib64/libbz2.so.1.0.6 7f3c6970c000-7f3c6990b000 ---p 0000f000 fd:00 201330716 /usr/lib64/libbz2.so.1.0.6 7f3c6990b000-7f3c6990c000 r--p 0000e000 fd:00 201330716 /usr/lib64/libbz2.so.1.0.6 7f3c6990c000-7f3c6990d000 rw-p 0000f000 fd:00 201330716 /usr/lib64/libbz2.so.1.0.6 7f3c6990d000-7f3c6995d000 r-xp 00000000 fd:00 202956129 /usr/lib64/libcssz.so 7f3c6995d000-7f3c69b5c000 ---p 00050000 fd:00 202956129 /usr/lib64/libcssz.so 7f3c69b5c000-7f3c69b5d000 r--p 0004f000 fd:00 202956129 /usr/lib64/libcssz.so 7f3c69b5d000-7f3c69b5f000 rw-p 00050000 fd:00 202956129 /usr/lib64/libcssz.so 7f3c69b5f000-7f3c69bb5000 rw-p 00000000 00:00 0 7f3c69bb5000-7f3c69bca000 r-xp 00000000 fd:00 201330632 /usr/lib64/libz.so.1.2.7 7f3c69bca000-7f3c69dc9000 ---p 00015000 fd:00 201330632 /usr/lib64/libz.so.1.2.7 7f3c69dc9000-7f3c69dca000 r--p 00014000 fd:00 201330632 /usr/lib64/libz.so.1.2.7 7f3c69dca000-7f3c69dcb000 rw-p 00015000 fd:00 201330632 /usr/lib64/libz.so.1.2.7 7f3c69dcb000-7f3c69deb000 r-xp 00000000 fd:00 217465304 /usr/lib64/libgflags.so.2.1 7f3c69deb000-7f3c69fea000 ---p 00020000 fd:00 217465304 /usr/lib64/libgflags.so.2.1 7f3c69fea000-7f3c69feb000 r--p 0001f000 fd:00 217465304 /usr/lib64/libgflags.so.2.1 7f3c69feb000-7f3c69fec000 rw-p 00020000 fd:00 217465304 /usr/lib64/libgflags.so.2.1 7f3c69fec000-7f3c69ff1000 r-xp 00000000 fd:00 201750197 /usr/lib64/libsnappy.so.1.1.4 7f3c69ff1000-7f3c6a1f0000 ---p 00005000 fd:00 201750197 /usr/lib64/libsnappy.so.1.1.4 7f3c6a1f0000-7f3c6a1f1000 r--p 00004000 fd:00 201750197 /usr/lib64/libsnappy.so.1.1.4 7f3c6a1f1000-7f3c6a1f2000 rw-p 00005000 fd:00 201750197 /usr/lib64/libsnappy.so.1.1.4 7f3c6a1f2000-7f3c6a1f9000 r-xp 00000000 fd:00 201330504 /usr/lib64/librt-2.17.so 7f3c6a1f9000-7f3c6a3f8000 ---p 00007000 fd:00 201330504 /usr/lib64/librt-2.17.so 7f3c6a3f8000-7f3c6a3f9000 r--p 00006000 fd:00 201330504 /usr/lib64/librt-2.17.so 7f3c6a3f9000-7f3c6a3fa000 rw-p 00007000 fd:00 201330504 /usr/lib64/librt-2.17.so 7f3c6a3fa000-7f3c6a411000 r-xp 00000000 fd:00 201330500 /usr/lib64/libpthread-2.17.so 7f3c6a411000-7f3c6a610000 ---p 00017000 fd:00 201330500 /usr/lib64/libpthread-2.17.so 7f3c6a610000-7f3c6a611000 r--p 00016000 fd:00 201330500 /usr/lib64/libpthread-2.17.so 7f3c6a611000-7f3c6a612000 rw-p 00017000 fd:00 201330500 /usr/lib64/libpthread-2.17.so 7f3c6a612000-7f3c6a616000 rw-p 00000000 00:00 0 7f3c6a616000-7f3c6a636000 r-xp 00000000 fd:00 201330467 /usr/lib64/ld-2.17.so 7f3c6a828000-7f3c6a835000 rw-p 00000000 00:00 0 7f3c6a835000-7f3c6a836000 r--p 0001f000 fd:00 201330467 /usr/lib64/ld-2.17.so 7f3c6a836000-7f3c6a837000 rw-p 00020000 fd:00 201330467 /usr/lib64/ld-2.17.so 7f3c6a837000-7f3c6a838000 rw-p 00000000 00:00 0 7ffe8001f000-7ffe80040000 rw-p 00000000 00:00 0 [stack] 7ffe80170000-7ffe80172000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted (core dumped)

zengqingfu1442 commented 6 years ago

[tcn@sfx-008 rocksdb-master]$ ldd sst_dump linux-vdso.so.1 => (0x00007ffed15f1000) libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f4ce3313000) librt.so.1 => /usr/lib64/librt.so.1 (0x00007f4ce310a000) libsnappy.so.1 => /usr/lib64/libsnappy.so.1 (0x00007f4ce2f04000) libgflags.so.2.1 => /usr/lib64/libgflags.so.2.1 (0x00007f4ce2ce3000) libz.so.1 => /usr/lib64/libz.so.1 (0x00007f4ce2acc000) libcssz.so => /usr/lib64/libcssz.so (0x00007f4ce2824000) libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007f4ce2614000) liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x00007f4ce2400000) libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00007f4ce2195000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f4ce1e1a000) libm.so.6 => /usr/lib64/libm.so.6 (0x00007f4ce1b17000) libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f4ce1901000) libc.so.6 => /usr/lib64/libc.so.6 (0x00007f4ce1540000) /lib64/ld-linux-x86-64.so.2 (0x00007f4ce3530000)

siddontang commented 6 years ago

I guess this may be caused by https://github.com/pingcap/tikv/blob/master/src/raftstore/store/snap.rs#L514

/cc @huachaohuang

huachaohuang commented 6 years ago

Is your enum value of kCSSZlibCompression (and others) consistent between rocksdb and the modified rust-rocksdb? What's your DBCompressionType here? Maybe you can just replace them with only DBCompressionType::CSSZlib in your test.

zengqingfu1442 commented 6 years ago

Yes, I have changed and it's consistent between rocksdb and the modified rust-rocksdb. It's _const COMPRESSIONPRIORITY: [DBCompressionType; 4] = [ DBCompressionType::Lz4, DBCompressionType::Snappy, DBCompressionType::Zstd, DBCompressionType::CSSZlib, ];

You say just leave only CSSZlib like this? _const COMPRESSIONPRIORITY: [DBCompressionType; 1] = [ // DBCompressionType::Lz4, // DBCompressionType::Snappy, // DBCompressionType::Zstd, DBCompressionType::CSSZlib, ]; Thanks.

huachaohuang commented 6 years ago

Yep, you can give it a try. If it crashes again, I guess maybe something is wrong with the modified rocksdb, you can run sst_dump with gdb to see what's wrong.

siddontang commented 6 years ago

@huachaohuang

Maybe this is a bug for us. If the user doesn't want to use compression, we can't use it for the snapshot SST too.

zengqingfu1442 commented 6 years ago

Thanks,

  1. I change the COMPRESSION_PRIORITY: tikv/src/util/rocksdb/mod.rs // Zlib and bzip2 are too slow. const COMPRESSION_PRIORITY: [DBCompressionType; 3] = [ // DBCompressionType::Lz4, DBCompressionType::Snappy, DBCompressionType::Zstd, DBCompressionType::CSSZlib, ];
  2. Changed the default compression-per-level for rocksdb-defaultcf, rocksdb-writecf and raftrocksdb-defaultcf from [no,no,lz4,lz4,lz4,zstd,zstd] to [no,no,csszlib,csszlib,csszlib,zstd,zstd]. tikv/src/config.rs compression_per_level: [ DBCompressionType::No, DBCompressionType::No, DBCompressionType::CSSZlib, DBCompressionType::CSSZlib, DBCompressionType::CSSZlib, DBCompressionType::Zstd, DBCompressionType::Zstd, ],

Then I re-built tikv. Now, I have setup the tidb cluster with my built tikv binary supported csszlib. The tikv instances don't crash this time. And now I'm running prepare.sh in tidb-bench/sysbench/ to generate data for test and it appears to work now.

zengqingfu1442 commented 6 years ago

The default parameter of rocksdb: max_bytes_for_level_multiplier = 10, then how about the value of max_bytes_for_level_multiplier in rocksdb for tikv?

huachaohuang commented 6 years ago

We use the same default as rocksdb for tikv.

zengqingfu1442 commented 6 years ago

Can you please tell me how you run the sysbench benchmark?

  1. ./prepare.sh Generate data fisrt
  2. ./oltp.sh
  3. ./select.sh
  4. ./insert.sh Is this the order that you ran sysbench? Thanks, now I'm running sysbench to evaluate the performance of CSSZlib as well as our CSS card.
huachaohuang commented 6 years ago

Sorry, I don't know exactly how that benchmark run, feel free to open an issue there, I think you can get an answer there. By the way, since the compression problem is fixed now, can we close this issue now?

zengqingfu1442 commented 6 years ago

Can you please tell me how to check that the data is really compressed by lz4 or zstd or any other compression library in tidb?

zhangjinpeng87 commented 6 years ago

You can check it in rocksdb's LOG file.

zengqingfu1442 commented 6 years ago

In the log of rocksdb used in tikv, there are some following messages, 2017/12/13-11:21:38.935983 7f4f220c0e40 RocksDB version: 5.7.0 2017/12/13-11:21:38.936317 7f4f220c0e40 Git sha rocksdb_build_git_sha:9f6165a2eb71fc32f74518a2884fb5811fbea693 2017/12/13-11:21:38.936324 7f4f220c0e40 Compile date Dec 1 2017 2017/12/13-11:21:38.936331 7f4f220c0e40 DB SUMMARY 2017/12/13-11:21:38.936391 7f4f220c0e40 SST files in /mnt/sfx-card-root/tikv3_1/data/db dir, Total Num: 0, files: ... ... 2017/12/13-11:21:38.936573 7f4f220c0e40 Compression algorithms supported: 2017/12/13-11:21:38.936578 7f4f220c0e40 Snappy supported: 1 2017/12/13-11:21:38.936581 7f4f220c0e40 Zlib supported: 1 2017/12/13-11:21:38.936583 7f4f220c0e40 Bzip supported: 1 2017/12/13-11:21:38.936585 7f4f220c0e40 LZ4 supported: 1 2017/12/13-11:21:38.936595 7f4f220c0e40 ZSTD supported: 1 2017/12/13-11:21:38.936599 7f4f220c0e40 Fast CRC32 supported: 0 2017/12/13-11:21:38.936683 7f4f220c0e40 [db/db_impl_open.cc:216] Creating manifest 1 ... ... 2017/12/13-11:21:38.938269 7f4f220c0e40 Options.write_buffer_size: 134217728 2017/12/13-11:21:38.938271 7f4f220c0e40 Options.max_write_buffer_number: 5 2017/12/13-11:21:38.938273 7f4f220c0e40 Options.compression[0]: NoCompression 2017/12/13-11:21:38.938275 7f4f220c0e40 Options.compression[1]: NoCompression 2017/12/13-11:21:38.938277 7f4f220c0e40 Options.compression[2]: CSSZlib 2017/12/13-11:21:38.938279 7f4f220c0e40 Options.compression[3]: CSSZlib 2017/12/13-11:21:38.938280 7f4f220c0e40 Options.compression[4]: CSSZlib 2017/12/13-11:21:38.938282 7f4f220c0e40 Options.compression[5]: CSSZlib 2017/12/13-11:21:38.938283 7f4f220c0e40 Options.compression[6]: CSSZlib 2017/12/13-11:21:38.938285 7f4f220c0e40 Options.bottommost_compression: Disabled

So can I make sure that my compression has been integrated into tikv?