schencoding / peahash

Code for SIGMOD'23 Pea Hash Paper
Mulan Permissive Software License, Version 2
6 stars 0 forks source link

uniform insert full #2

Open KINGFIOX opened 7 months ago

KINGFIOX commented 7 months ago

When I inserted the pea hash with uniform distribution in dram unique edition, I found it was blocked as it reach 230.1M data. After waiting for a long time, I chose to interrupt the program.

This is the logs:

No.1 insert pea 1threads test begin Distribution = uniform EPOCH registration in application level Pea Comprehensive Benchmark insertion start Insert1 (below is the logs printings I added in the code) Inserted 10000000 entries so far. Throughput: 5.31116e+06 entries per second. Inserted 20000000 entries so far. Throughput: 3.26115e+06 entries per second. Inserted 30000000 entries so far. Throughput: 5.72216e+06 entries per second. Inserted 40000000 entries so far. Throughput: 4.639e+06 entries per second. Inserted 50000000 entries so far. Throughput: 5.27367e+06 entries per second. Inserted 60000000 entries so far. Throughput: 1.35598e+06 entries per second. Inserted 70000000 entries so far. Throughput: 6.79158e+06 entries per second. Inserted 80000000 entries so far. Throughput: 6.15792e+06 entries per second. Inserted 90000000 entries so far. Throughput: 5.62764e+06 entries per second. Inserted 100000000 entries so far. Throughput: 5.03531e+06 entries per second. Inserted 110000000 entries so far. Throughput: 4.54232e+06 entries per second. Inserted 120000000 entries so far. Throughput: 5.15846e+06 entries per second. Inserted 130000000 entries so far. Throughput: 5.12154e+06 entries per second. Inserted 140000000 entries so far. Throughput: 5.13705e+06 entries per second. Inserted 150000000 entries so far. Throughput: 5.09772e+06 entries per second. Inserted 160000000 entries so far. Throughput: 5.10169e+06 entries per second. Inserted 170000000 entries so far. Throughput: 5.078e+06 entries per second. Inserted 180000000 entries so far. Throughput: 4.84972e+06 entries per second. Inserted 190000000 entries so far. Throughput: 4.68886e+06 entries per second. Inserted 200000000 entries so far. Throughput: 4.59412e+06 entries per second. Inserted 210000000 entries so far. Throughput: 4.47617e+06 entries per second. Inserted 220000000 entries so far. Throughput: 4.28257e+06 entries per second. Inserted 230000000 entries so far. Throughput: 1.11766e+06 entries per second.

KINGFIOX commented 7 months ago

同样的,当我使用unique、skew=0进行插入的时候,也是插到 270M 的时候,就差不进去了。其他的 pre-load 都是默认值。

KINGFIOX commented 7 months ago

running args: ./test_pmem -distribution uniform -index pea -p 500000000 -t 1 -op insert

DrWereviruswolf commented 7 months ago

For bulkloading 270M key-value pairs, you should change the following code:

  1. Set the size of memory pool > 16GB: https://github.com/schencoding/peahash/blob/a4895674ca749c07d7be0c52de4c0400a930d118/pea_dram/unique/test/test_pmem.cpp#L56 I added the check of memory pool in the latest commit.
  2. Quadruple the size of lock table, which should be 589824 instead of https://github.com/schencoding/peahash/blob/a4895674ca749c07d7be0c52de4c0400a930d118/pea_dram/unique/src/PeaHash/pea_hash.h#L22 Here I use a simple way to implement the lock table, which should have been re-allocated as the directory doubling. If I have free time, I will modify the code of lock table resizing.
KINGFIOX commented 7 months ago

但是有点奇怪的就是,skew=0.8(也就是默认值),确实是可以插入 500M 以上的数据

DrWereviruswolf commented 7 months ago

unique版本下那个skew参数没有用。duplicate版本下插入500M skew=0.8时,实际空间占用可能是够的,因为大量相同key的value被压缩进了value segment(bucket)。

DrWereviruswolf commented 5 months ago

Lock table resizing is implemented in a specific branch dev and merged into main branch. pea_pmem has the same issue to fix. https://github.com/schencoding/peahash/blob/310620dab257ddd1f95353ded8d3dacee043eebc/pea_dram/unique/src/PeaHash/pea_hash.h#L880.