rockset / rocksdb-cloud

A library that provides an embeddable, persistent key-value store for fast storage optimized for AWS
http://rocksdb.org
GNU General Public License v2.0
733 stars 117 forks source link

Silent crash of db_bench on Ubuntu 20.04 #98

Open naivewong opened 4 years ago

naivewong commented 4 years ago

Problem Description

I run a simplified version of db_bench (only contains insert/read) with keep_local_sst_files=false and after around five minutes, it crashed silently. From the log, I can see it happened during the compaction between L0 to L1. I suspect it may be related to fetching from and writing to S3 at the same time which causes error in libcurl.

Bench Settings

op=randomfill
key_size=16
value_size=5000
num_insertion=1000000
keep_local_sst_files=false

// Here I set smaller values to reproduce the error more quickly.
options.write_buffer_size = 8 << 20
options.max_bytes_for_level_base = 32 * 1048576
options.target_file_size_base = 16 * 1024 * 1024

Platform

Ubuntu 20.04LTS. (P.S. The problem also appears in my Ubuntu 18.04 EC2 instance)

GDB Backtrace

Thread 5 "rocksdb:high0" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff5326700 (LWP 234292)]
__libc_write (nbytes=31, buf=0x7fffe8037423, fd=10) at ../sysdeps/unix/sysv/linux/write.c:26
26  ../sysdeps/unix/sysv/linux/write.c: No such file or directory.
(gdb) bt
#0  __libc_write (nbytes=31, buf=0x7fffe8037423, fd=10) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=10, buf=0x7fffe8037423, nbytes=31) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x00007ffff7198359 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#3  0x00007ffff719353e in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#4  0x00007ffff7192554 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#5  0x00007ffff7192a17 in BIO_write () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#6  0x00007ffff6f4cdde in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#7  0x00007ffff6f4dcd9 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x00007ffff6f5786e in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#9  0x00007ffff6f55a65 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007ffff6f60ea3 in SSL_shutdown () from /lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x00007ffff7426235 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#12 0x00007ffff74262a5 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#13 0x00007ffff73e481e in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#14 0x00007ffff73e70e5 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#15 0x00007ffff73f9a5f in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#16 0x00007ffff73fa8c1 in curl_multi_perform () from /lib/x86_64-linux-gnu/libcurl.so.4
#17 0x00007ffff73f0d3b in curl_easy_perform () from /lib/x86_64-linux-gnu/libcurl.so.4
#18 0x00007ffff7b9bdf0 in Aws::Http::CurlHttpClient::MakeRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::Utils::RateLimits::RateLimiterInterface*, Aws::Utils::RateLimits::RateLimiterInterface*) const ()
   from /usr/local/lib/libaws-cpp-sdk-core.so
#19 0x00007ffff7bd9459 in Aws::Client::AWSClient::AttemptOneRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&, char const*, char const*) const ()
   from /usr/local/lib/libaws-cpp-sdk-core.so
#20 0x00007ffff7bf780b in Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*) const ()
   from /usr/local/lib/libaws-cpp-sdk-core.so
#21 0x00007ffff7bf9abd in Aws::Client::AWSXMLClient::MakeRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*) const ()
   from /usr/local/lib/libaws-cpp-sdk-core.so
#22 0x00007ffff7e6e373 in Aws::S3::S3Client::PutObject(Aws::S3::Model::PutObjectRequest const&) const ()
   from /usr/local/lib/libaws-cpp-sdk-s3.so
#23 0x000055555589623d in rocksdb::AwsS3ClientWrapper::PutCloudObject (size_hint=4485000, request=..., 
    this=0x555555a41190) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#24 rocksdb::S3StorageProvider::DoPutCloudObject (this=0x555555a1bc20, 
    local_file="/tmp/bench_cloud/000084.sst-6d650822daa1fee9", 
    bucket_name="rockset.cloud-db-examples.alec", 
    object_path="/tmp/bench_cloud/000084.sst-6d650822daa1fee9", file_size=4485000)
    at cloud/aws/aws_s3.cc:889
#25 0x00005555558a7229 in rocksdb::CloudStorageProviderImpl::PutCloudObject (this=0x555555a1bc20, 
    local_file="/tmp/bench_cloud/000084.sst-6d650822daa1fee9", 
    bucket_name="rockset.cloud-db-examples.alec", 
    object_path="/tmp/bench_cloud/000084.sst-6d650822daa1fee9") at cloud/cloud_storage_provider.cc:361
#26 0x00005555556dd16b in rocksdb::CloudEnvImpl::CopyLocalFileToDest (this=0x555555a1c640, 
    local_name="/tmp/bench_cloud/000084.sst-6d650822daa1fee9", 
    dest_name="/tmp/bench_cloud/000084.sst-6d650822daa1fee9") at /usr/include/c++/9/bits/basic_string.h:936
#27 0x00005555558a5949 in rocksdb::CloudStorageWritableFileImpl::Close (this=0x7fffe808a430)
    at cloud/cloud_storage_provider.cc:175
#28 0x00005555555f8b55 in rocksdb::LegacyWritableFileWrapper::Close (this=<optimized out>)
    at /usr/include/c++/9/bits/unique_ptr.h:360
#29 0x000055555561a1ea in rocksdb::WritableFileWriter::Close (this=this@entry=0x555555b1c460)
    at ./include/rocksdb/status.h:348
#30 0x00005555558ad2b0 in rocksdb::BuildTable (dbname=..., env=<optimized out>, fs=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--
    ioptions=..., mutable_cf_options=..., file_options=..., table_cache=<optimized out>, 
    iter=<optimized out>, range_del_iters=..., meta=<optimized out>, internal_comparator=..., 
    int_tbl_prop_collector_factories=<optimized out>, column_family_id=<optimized out>, 
    column_family_name=..., snapshots=..., earliest_write_conflict_snapshot=<optimized out>, 
    snapshot_checker=<optimized out>, compression=<optimized out>, sample_for_compression=<optimized out>, 
    compression_opts=..., paranoid_file_checks=<optimized out>, internal_stats=<optimized out>, 
    reason=<optimized out>, event_logger=<optimized out>, job_id=<optimized out>, 
    io_priority=<optimized out>, table_properties=<optimized out>, level=<optimized out>, 
    creation_time=<optimized out>, oldest_key_time=<optimized out>, write_hint=<optimized out>, 
    file_creation_time=<optimized out>) at db/builder.cc:215
#31 0x00005555557a49c7 in rocksdb::FlushJob::WriteLevel0Table (this=0x7ffff5324140)
    at ./db/column_family.h:328
#32 0x00005555557a6520 in rocksdb::FlushJob::Run (this=this@entry=0x7ffff5324140, 
    prep_tracker=prep_tracker@entry=0x555555b250f8, file_meta=file_meta@entry=0x7ffff5323c50)
    at db/flush_job.cc:226
#33 0x0000555555744451 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=0x555555b23df0, 
    cfd=0x555555b31430, mutable_cf_options=..., made_progress=0x7ffff5324d03, job_context=0x7ffff5324d40, 
    superversion_context=0x7fffe8000de0, snapshot_seqs=std::vector of length 0, capacity 0, 
    earliest_write_conflict_snapshot=72057594037927935, snapshot_checker=0x0, log_buffer=0x7ffff5324f10, 
    thread_pri=rocksdb::Env::HIGH) at db/db_impl/db_impl_compaction_flush.cc:191
#34 0x0000555555747bba in rocksdb::DBImpl::FlushMemTablesToOutputFiles (this=0x555555b23df0, 
    bg_flush_args=..., made_progress=0x7ffff5324d03, job_context=0x7ffff5324d40, 
    log_buffer=0x7ffff5324f10, thread_pri=rocksdb::Env::HIGH) at db/db_impl/db_impl_compaction_flush.cc:256
#35 0x000055555574905c in rocksdb::DBImpl::BackgroundFlush (this=<optimized out>, 
    made_progress=0x7ffff5324d03, job_context=0x7ffff5324d40, log_buffer=0x7ffff5324f10, 
    reason=0x7ffff5324d04, thread_pri=rocksdb::Env::HIGH) at db/db_impl/db_impl_compaction_flush.cc:2200
#36 0x0000555555752b5c in rocksdb::DBImpl::BackgroundCallFlush (this=0x555555b23df0, 
    thread_pri=rocksdb::Env::HIGH) at db/db_impl/db_impl_compaction_flush.cc:2237
#37 0x00005555556b636a in std::function<void ()>::operator()() const (this=0x7ffff5325880)
    at /usr/include/c++/9/bits/std_function.h:683
#38 rocksdb::ThreadPoolImpl::Impl::BGThread (this=0x555555a1b430, thread_id=0)
    at util/threadpool_imp.cc:265
#39 0x00005555556b64fe in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper (arg=0x555555b20f30)
    at util/threadpool_imp.cc:306
#40 0x00007ffff7883cb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#41 0x00007ffff7ad0609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#42 0x00007ffff7573103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Contents under the db path

drwxr-xr-x  2 alec alec    4096 Sept   1 12:55 ./
drwxrwxrwt 34 root root   12288 Sept   1 15:16 ../
-rw-r--r--  1 alec alec 8089317 Sept   1 12:55 000080.log
-rw-r--r--  1 alec alec 1406175 Sept   1 12:55 000083.log
-rw-r--r--  1 alec alec 4485000 Sept   1 12:55 000084.sst-6d650822daa1fee9
-rw-r--r--  1 alec alec      34 Sept   1 12:44 CLOUDMANIFEST
-rw-r--r--  1 alec alec      16 Sept   1 12:44 CURRENT
-rw-r--r--  1 alec alec      37 Sept   1 12:44 IDENTITY
-rw-rw-r--  1 alec alec  173288 Sept   1 12:55 LOG
-rw-r--r--  1 alec alec    4131 Sept   1 12:55 MANIFEST-6d650822daa1fee9
-rw-r--r--  1 alec alec    5270 Sept   1 12:44 OPTIONS-000005

LOG (containing the last compaction)

2020/09/01-12:46:28.320495 7ffff5326700 [ush_job.cc:347] [default] [JOB 39] Level-0 flush table #81: started
2020/09/01-12:46:30.883438 7ffff5326700 [s3] PutCloudObject rockset.cloud-db-examples.alec//tmp/bench_cloud/000081.sst-6d650822daa1fee9, size 4493335, OK
2020/09/01-12:46:31.557440 7ffff5326700 EVENT_LOG_v1 {"time_micros": 1598935591557385, "cf_name": "default", "job": 39, "event": "table_file_creation", "file_number": 81, "file_size": 4493335, "table_properties": {"data_size": 4467761, "index_size": 44425, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 2117, "raw_key_size": 38496, "raw_average_key_size": 24, "raw_value_size": 8020000, "raw_average_value_size": 5000, "num_data_blocks": 1604, "num_entries": 1604, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "Snappy", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1598935585, "oldest_key_time": 1598935585, "file_creation_time": 1598935588}}
2020/09/01-12:46:31.557477 7ffff5326700 [ush_job.cc:394] [default] [JOB 39] Level-0 flush table #81: 4493335 bytes OK
2020/09/01-12:46:31.758168 7ffff5326700 [s3] PutCloudObject rockset.cloud-db-examples.alec//tmp/bench_cloud/MANIFEST-6d650822daa1fee9, size 3935, OK
2020/09/01-12:46:31.758266 7ffff5326700 [WARN] [lumn_family.cc:845] [default] Stopping writes because we have 37 level-0 files
2020/09/01-12:46:31.822738 7ffff5326700 (Original Log Time 2020/09/01-12:46:31.575978) [mtable_list.cc:445] [default] Level-0 commit table #81 started
2020/09/01-12:46:31.822750 7ffff5326700 (Original Log Time 2020/09/01-12:46:31.758230) [mtable_list.cc:501] [default] Level-0 commit table #81: memtable #1 done
2020/09/01-12:46:31.822754 7ffff5326700 (Original Log Time 2020/09/01-12:46:31.758258) EVENT_LOG_v1 {"time_micros": 1598935591758242, "job": 39, "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [37, 0, 0, 0, 0, 0, 0], "immutable_memtables": 0}
2020/09/01-12:46:31.822759 7ffff5326700 (Original Log Time 2020/09/01-12:46:31.758292) [_impl/db_impl_compaction_flush.cc:203] [default] Level summary: files[37 0 0 0 0 0 0] max score 8.25
2020/09/01-12:46:31.822779 7ffff5326700 [_impl/db_impl_files.cc:348] [JOB 39] Try to delete WAL files size 8076360, prev total WAL file size 8076360, number of live WAL files 2.
2020/09/01-12:54:15.305865 7fffdc7f8700 [_impl/db_impl.cc:849] ------- DUMPING STATS -------
2020/09/01-12:54:15.305988 7fffdc7f8700 [_impl/db_impl.cc:851] 
** DB Stats **
Uptime(secs): 600.8 total, 600.0 interval
Cumulative writes: 59K writes, 59K keys, 59K commit groups, 1.0 writes per commit group, ingest: 0.28 GB, 0.47 MB/s
Cumulative WAL: 59K writes, 0 syncs, 59385.00 writes per sync, written: 0.28 GB, 0.47 MB/s
Cumulative stall: 00:02:9.577 H:M:S, 21.6 percent
Interval writes: 59K writes, 59K keys, 59K commit groups, 1.0 writes per commit group, ingest: 284.98 MB, 0.47 MB/s
Interval WAL: 59K writes, 0 syncs, 59385.00 writes per sync, written: 0.28 MB, 0.47 MB/s
Interval stall: 00:02:9.577 H:M:S, 21.6 percent

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     37/4   158.55 MB   8.2      0.0     0.0      0.0       0.2      0.2       0.0   1.0      0.0      1.3    123.63              0.00        37    3.341       0      0
 Sum     37/4   158.55 MB   0.0      0.0     0.0      0.0       0.2      0.2       0.0   1.0      0.0      1.3    123.63              0.00        37    3.341       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.2      0.2       0.0   1.0      0.0      1.3    123.63              0.00        37    3.341       0      0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
High      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.2      0.2       0.0   0.0      0.0      1.3    123.63              0.00        37    3.341       0      0
Uptime(secs): 600.8 total, 600.0 interval
Flush(GB): cumulative 0.155, interval 0.155
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.15 GB write, 0.26 MB/s write, 0.00 GB read, 0.00 MB/s read, 123.6 seconds
Interval compaction: 0.15 GB write, 0.26 MB/s write, 0.00 GB read, 0.00 MB/s read, 123.6 seconds
Stalls(count): 16 level0_slowdown, 16 level0_slowdown_with_compaction, 2 level0_numfiles, 2 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 36 memtable_compaction, 0 memtable_slowdown, interval 54 total count

** File Read Latency Histogram By Level [default] **

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     37/4   158.55 MB   8.2      0.0     0.0      0.0       0.2      0.2       0.0   1.0      0.0      1.3    123.63              0.00        37    3.341       0      0
 Sum     37/4   158.55 MB   0.0      0.0     0.0      0.0       0.2      0.2       0.0   1.0      0.0      1.3    123.63              0.00        37    3.341       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
High      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.2      0.2       0.0   0.0      0.0      1.3    123.63              0.00        37    3.341       0      0
Uptime(secs): 600.8 total, 0.0 interval
Flush(GB): cumulative 0.155, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.15 GB write, 0.26 MB/s write, 0.00 GB read, 0.00 MB/s read, 123.6 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 16 level0_slowdown, 16 level0_slowdown_with_compaction, 2 level0_numfiles, 2 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 36 memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [default] **
2020/09/01-12:54:45.349226 7ffff5b4a700 [s3] PutCloudObject rockset.cloud-db-examples.alec//tmp/bench_cloud/000018.sst-6d650822daa1fee9, size 16879170, OK
2020/09/01-12:54:45.354373 7ffff5b4a700 [mpaction/compaction_job.cc:1344] [default] [JOB 6] Generated table #18: 6025 keys, 16879170 bytes
2020/09/01-12:54:45.354447 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936085354403, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 18, "file_size": 16879170, "table_properties": {"data_size": 16782104, "index_size": 170989, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 7621, "raw_key_size": 144600, "raw_average_key_size": 24, "raw_value_size": 30125000, "raw_average_value_size": 5000, "num_data_blocks": 6025, "num_entries": 6025, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "Snappy", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1598935455, "oldest_key_time": 0, "file_creation_time": 1598935484}}
2020/09/01-12:55:22.509997 7ffff5b4a700 [s3] PutCloudObject rockset.cloud-db-examples.alec//tmp/bench_cloud/000082.sst-6d650822daa1fee9, size 1051429, OK
2020/09/01-12:55:22.510547 7ffff5b4a700 [mpaction/compaction_job.cc:1344] [default] [JOB 6] Generated table #82: 375 keys, 1051429 bytes
2020/09/01-12:55:22.510615 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936122510574, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 82, "file_size": 1051429, "table_properties": {"data_size": 1044539, "index_size": 10335, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 581, "raw_key_size": 9000, "raw_average_key_size": 24, "raw_value_size": 1875000, "raw_average_value_size": 5000, "num_data_blocks": 375, "num_entries": 375, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "Snappy", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1598935455, "oldest_key_time": 0, "file_creation_time": 1598936085}}
2020/09/01-12:55:23.528381 7ffff5b4a700 [mpaction/compaction_job.cc:1412] [default] [JOB 6] Compacted 4@0 files to L1 => 17930599 bytes
2020/09/01-12:55:23.710201 7ffff5b4a700 [s3] PutCloudObject rockset.cloud-db-examples.alec//tmp/bench_cloud/MANIFEST-6d650822daa1fee9, size 4131, OK
2020/09/01-12:55:23.710313 7ffff5b4a700 [WARN] [lumn_family.cc:888] [default] Stalling writes because we have 33 level-0 files rate 16777216
2020/09/01-12:55:23.710376 7ffff5b4a700 (Original Log Time 2020/09/01-12:55:23.710269) [mpaction/compaction_job.cc:743] [default] compacted to: files[33 2 0 0 0 0 0] max score 8.25, MB/sec: 0.0 rd, 0.0 wr, level 1, files in(4, 0) out(2) MB in(17.1, 0.0) out(17.1), read-write-amplify(2.0) write-amplify(1.0) OK, records in: 6414, records dropped: 14 output_compression: Snappy
2020/09/01-12:55:23.710386 7ffff5b4a700 (Original Log Time 2020/09/01-12:55:23.710304) EVENT_LOG_v1 {"time_micros": 1598936123710288, "job": 6, "event": "compaction_finished", "compaction_time_micros": 638116028, "compaction_time_cpu_micros": 0, "output_level": 1, "num_output_files": 2, "total_output_size": 17930599, "num_input_records": 6414, "num_output_records": 6400, "num_subcompactions": 1, "output_compression": "Snappy", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [33, 2, 0, 0, 0, 0, 0]}
2020/09/01-12:55:23.710497 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936123710494, "job": 6, "event": "table_file_deletion", "file_number": 14}
2020/09/01-12:55:23.710554 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936123710552, "job": 6, "event": "table_file_deletion", "file_number": 12}
2020/09/01-12:55:23.710585 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936123710583, "job": 6, "event": "table_file_deletion", "file_number": 10}
2020/09/01-12:55:23.710615 7ffff5b4a700 EVENT_LOG_v1 {"time_micros": 1598936123710613, "job": 6, "event": "table_file_deletion", "file_number": 7}
dhruba commented 4 years ago

This is something that we should fix, @mrambacher?

Curious to know if you see a similar problem when keep_local_sst_files=true?

naivewong commented 4 years ago

Thanks for replying. It's fine with keep_local_sst_files=true, maybe that's because it will prefetch the files and the time spent on reading S3 is much much shorter. However, this problem also occurs when performing a large number of Get operations. BTW, can you kindly point out some directions that I can try to figure it out myself which may also help you? Or that's the problem related to the lib version or the OS? Thanks.