Closed authwork closed 2 years ago
Hi,
You are not deallocating the memory you create with new. You should use std::chrono::high_resolution_clock
to measure times. You should try different configuration parameters of the DynamicPGM, such as the PGMType
template argument and the base
constructor argument, to find the most performing configuration for your batch of operations, and the same holds for the configuration parameters of the B+tree.
Here's what I get with the edited code below using gcc-10 and full optimisations on a Xeon Gold 5118 CPU (notice I replaced bpt->get_stats().memory()
with 0
because it was undefined, I guess it was your addition to the tlx library).
PGM: insert done, size 134218824, time cost 113
PGM: success 16777216, size 134218824, time cost 49
B+ tree: insert done, size 0, time cost 116
B+ tree: success 16777216, size 0, time cost 90
Finally, given the other issues you have opened in the past, I have to ask you to please use issues here for actual bugs with the code in this repository and not your own code / experiments. Thanks.
#include <iostream>
#include <chrono>
#include <vector>
#include <tlx/container/btree_map.hpp>
#include "pgm/pgm_index_dynamic.hpp"
// pgm
template<typename KeyType, typename ValueType>
void pgm_tests(KeyType *keys, ValueType *values, uint64_t number) {
typedef pgm::DynamicPGMIndex<KeyType, ValueType, pgm::PGMIndex<KeyType, 32>> PGMFP;
PGMFP pgm_index(2);
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t i = 0; i < number; i++) {
pgm_index.insert_or_assign(keys[i], values[i]);
}
auto t1 = std::chrono::high_resolution_clock::now();
auto time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
uint64_t size = pgm_index.size_in_bytes();
printf("PGM: insert done, size %lu, time cost %ld\n", size, time_cost);
t0 = std::chrono::high_resolution_clock::now();
for (uint64_t i = 0; i < number; i++) {
auto iter = pgm_index.find(keys[i]);
if (iter->second != values[i]) {
printf("error %lu\n", i);
return;
}
}
t1 = std::chrono::high_resolution_clock::now();
time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
printf("PGM: success %lu, size %lu, time cost %ld\n", number, size,
time_cost);
}
// btree
template<typename KeyType, typename ValueType>
void bptree_tests(KeyType *keys, ValueType *values, uint64_t number) {
typedef tlx::btree_map<KeyType, ValueType> bptree;
bptree bpt;
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t i = 0; i < number; i++) {
bpt.insert2(keys[i], values[i]);
}
auto t1 = std::chrono::high_resolution_clock::now();
auto time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
uint64_t size = 0; // bpt->get_stats().memory(); <- UNDEFINED
printf("B+ tree: insert done, size %lu, time cost %ld\n", size, time_cost);
t0 = std::chrono::high_resolution_clock::now();
for (uint64_t i = 0; i < number; i++) {
auto iter = bpt.find(keys[i]);
if (iter->second != values[i]) {
printf("error %lu\n", i);
return;
}
}
t1 = std::chrono::high_resolution_clock::now();
time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
printf("B+ tree: success %lu, size %lu, time cost %ld\n", number, size,
time_cost);
}
int main() {
std::vector<uint32_t> keys(1 << 24);
for (uint64_t i = 0; i < keys.size(); ++i)
keys[i] = i;
decltype(keys) values = keys;
pgm_tests(keys.data(), values.data(), keys.size());
bptree_tests(keys.data(), values.data(), keys.size());
return 0;
}
@gvinciguerra Thanks for your explanations. the function of gettimeofday
(us level) and std::chrono::high_resolution_clock
(ns level) are the same. It should be preferred to use ns-level for per kv's latency.
The latency results are different because of using different platform (i7 of a notebook). I will try it on other ones.
Feel very sorry to bring inconvenience to you. Maybe it is better to open GitHub discussion for this project? Thanks.
Dear @gvinciguerra We compare PGM-Index with tlx-implementation-btree. We found the index size has been greatly reduced, while the latency is not optimized as discussed in the paper. Is there any suggestions to our implementations?
We store 2^24 kv in three types:
uint32_t
,uint64_t
,uint128_t
and the result is: