ruihong123 / dLSM

dLSM: An LSM-Based Index for RDMA-Enabled Memory Disaggregation
BSD 3-Clause "New" or "Revised" License
28 stars 7 forks source link

Stuck in the near data compaction #8

Open Zivvv opened 3 months ago

Zivvv commented 3 months ago

Hello!

I run dLSM as README but the db_bench seems to stuck in the near data compaction. Here the logs from Server and compute node. I have tested the RDMA with perftest and the connection is good. Did I miss something? Thank you!

root@node005:~/dLSM/build$ ./Server searching for IB devices in host found 1 device(s) device not specified, using first one found: mlx5_0 New MR was registered with addr=0x7f060fb95010, lkey=0xa362, rkey=0xa362, flags=0x7, size=10240000, total registered size is 0 New MR was registered with addr=0x7f060f1d0010, lkey=0x6e2d, rkey=0x6e2d, flags=0x7, size=10240000, total registered size is 10240000 SST buffer, send&receive buffer were registered with a maximum outstanding wr number is32768 maximum query pair number is131072 maximum completion queue number is16777216 maximum memory region number is16777216 maximum memory region size is18446744073709551615 checkpoint0connection built up from192.168.6.744687 connection family is 2 A new shared memory thread start checkpoint1checkpoint1QP was created, QP number=0x37 checkpoint2 Local LID = 0x0 total bytes: 23read byte: 23Remote QP number = 0x38 Remote LID = 0x0 Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22 QP 0x7f0608002278 state was change to RTS The connected compute node's id is 1 Polling sync option handlerory 87 GB
total bytes: 1read byte: 1Option sync finished Polling sync option handler Option sync finished compute node sync number is 1Register memory for computing node create query pair command receive for Remote QP number=0x39 Remote LID = 0x0 QP was created, QP number=0x38 Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22 QP 0x7f060801d678 state was change to RTS create query pair command receive for Remote QP number=0x3a Remote LID = 0x0 QP was created, QP number=0x39 Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22 QP 0x7f0608025da8 state was change to RTS create query pair command receive for Remote QP number=0x3b Remote LID = 0x0 QP was created, QP number=0x3a Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22 QP 0x7f060802e638 state was change to RTS create query pair command receive for Remote QP number=0x3c Remote LID = 0x0 QP was created, QP number=0x3b Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22 QP 0x7f0608036d88 state was change to RTS near data compaction Register memory for computing node number 0 got bad completion with status: 0xc, vendor syndrome: 0x81 RDMA Write Failed q id is QP number=0x37 Register memory for computing node number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 RDMA Write Failed q id is QP number=0x37 corrupt message from client. 0ent Polling Remote Compaction content

root@node007:~/dLSM/build# ./db_bench --benchmarks=fillrandom --threads=1 --value_size=400 --num=50000000 --bloom_bits=10 --readwritepercent=5 --compute_node_id=0 --fixed_compute_shards_num=0 Mark: valgrind socket info1 searching for IB devices in host found 1 device(s) device not specified, using first one found: mlx5_0 New MR was registered with addr=0x7fc2322dc010, lkey=0xb978, rkey=0xb978, flags=0x7, size=10240000, total registered size is 0 New MR was registered with addr=0x7fc231917010, lkey=0x7535, rkey=0x7535, flags=0x7, size=10240000, total registered size is 10240000 SST buffer, send&receive buffer were registered with a maximum outstanding wr number is32768 maximum query pair number is131072 maximum completion queue number is16777216 maximum memory region number is16777216 maximum memory region size is18446744073709551615 Success to connect to 192.168.6.5 TCP connection was established connect to node id 0QP was created, QP number=0x38

Local LID = 0x0 total bytes: 23read byte: 23Remote QP number = 0x37 Remote LID = 0x0 Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72 QP 0x7fc22c0022b8 state was change to RTS total bytes: 1read byte: 1Finish the connection with node 0 New MR was registered with addr=0x7fc1ebfff010, lkey=0x9d11, rkey=0x9d11, flags=0x7, size=1073741824, total registered size is 20480000 dLSM: version 1.22 Date: Wed Jul 10 06:23:09 2024 Start to sync options client handling thread CPU: 64 * Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz CPUCache:
Keys: 16 bytes each Values: 400 bytes each (200 bytes after compression) Entries: 50000000 RawSize: 19836.4 MB (estimated) FileSize: 10299.7 MB (estimated) WARNING: Snappy compression is not enabled

DBImpl start New MR was registered with addr=0x7fc1e9ffe010, lkey=0xcf8e, rkey=0xcf8e, flags=0x7, size=33554432, total registered size is 1094221824 Memory used up, Initially, allocate new one, memory pool is Version_edit, total memory this pool is 1 communication thread created DBImpl finished DBImpl deallocated Total number of entries within the cahce is 0DBImpl start communication thread created DBImpl finished validation write finished start front-end threads Wait for thread start total bytes: 1read byte: 1sync wait time is 384180Threads start to run New MR was registered with addr=0x7fc197fff010, lkey=0x9e12, rkey=0x9e12, flags=0x7, size=1073741824, total registered size is 1127776256 Memory used up, Initially, allocate new one, memory pool is FlushBuffer, total memory this pool is 1 New MR was registered with addr=0x7fc13ffff010, lkey=0xac13, rkey=0xac13, flags=0x7, size=1073741824, total registered size is 2201518080 Memory used up, Initially, allocate new one, memory pool is IndexChunk, total memory this pool is 1 New MR was registered with addr=0x7fc0effff010, lkey=0x14a14, rkey=0x14a14, flags=0x7, size=1073741824, total registered size is 3275259904 Memory used up, Initially, allocate new one, memory pool is FilterChunk, total memory this pool is 1 Remote memory registeration, size: 1073741824
polled reply bufferr QP was created, QP number=0x39

QP num to be sent = 0x39 Local LID = 0x0 QP was created, QP number=0x3a

QP num to be sent = 0x3a Local LID = 0x0 QP was created, QP number=0x3b Polling reply buffer QP num to be sent = 0x3b Local LID = 0x0uffer QP was created, QP number=0x3c Polling reply buffer QP num to be sent = 0x3c Local LID = 0x0uffer Remote QP number=0x38 Remote LID = 0x0ffer Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72 QP 0x7fc1d80099e8 state was change to RTS Remote QP number=0x39 Remote LID = 0x0ffer Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72 QP 0x7fc180041538 state was change to RTS Remote QP number=0x3a Remote LID = 0x0ffer Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72 QP 0x7fc138041538 state was change to RTS Remote QP number=0x3b Remote LID = 0x0 Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72 QP 0x7fc188041538 state was change to RTS number 0 got bad completion with status: 0xc, vendor syndrome: 0x81 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0xc, vendor syndrome: 0x81 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0xc, vendor syndrome: 0x81 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0xc, vendor syndrome: 0x81 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074Remote memory registeration, size: 1073741824 polled reply bufferr number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9 number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9 BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074Remote memory registeration, size: 1073741824 Polling reply buffer ops

Thank you for your time and appreciate for your help!