[Compared with B+tree] operation latency issue

authwork commented 2 years ago

Dear @gvinciguerra We compare PGM-Index with tlx-implementation-btree. We found the index size has been greatly reduced, while the latency is not optimized as discussed in the paper. Is there any suggestions to our implementations?

// pgm
template<typename KeyType, typename ValueType>
void pgm_tests(KeyType *keys, ValueType *values, uint64_t number) {
    long time_cost;
    struct timeval start, end;
    typedef pgm::DynamicPGMIndex<KeyType, ValueType> PGMFP;
    PGMFP *pgm_index = new PGMFP();
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < number; i++) {
        pgm_index->insert_or_assign(keys[i], values[i]);
    }
    gettimeofday(&end, NULL);
    time_cost = 1000000 * (end.tv_sec - start.tv_sec)
            + (end.tv_usec - start.tv_usec);
    uint64_t size = pgm_index->size_in_bytes();
    printf("PGM: insert done, size %lu, time cost %ld\n", size, time_cost);
    start = end;
    for (uint64_t i = 0; i < number; i++) {
        auto iter = pgm_index->find(keys[i]);
        if (iter->second != values[i]) {
            printf("error %lu\n", i);
            return;
        }
    }
    gettimeofday(&end, NULL);
    time_cost = 1000000 * (end.tv_sec - start.tv_sec)
            + (end.tv_usec - start.tv_usec);
    printf("PGM: success %lu, size %lu, time cost %ld\n", number, size,
            time_cost);
}

// btree
template<typename KeyType, typename ValueType>
void bptree_tests(KeyType *keys, ValueType *values, uint64_t number) {
    long time_cost;
    struct timeval start, end;
    typedef tlx::btree_map<KeyType, ValueType> bptree;
    bptree *bpt = new bptree();
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < number; i++) {
        bpt->insert2(keys[i], values[i]);
    }
    gettimeofday(&end, NULL);
    time_cost = 1000000 * (end.tv_sec - start.tv_sec)
            + (end.tv_usec - start.tv_usec);
    uint64_t size = bpt->get_stats().memory();
    printf("B+ tree: insert done, size %lu, time cost %ld\n", size, time_cost);
    start = end;
    for (uint64_t i = 0; i < number; i++) {
        auto iter = bpt->find(keys[i]);
        if (iter->second != values[i]) {
            printf("error %lu\n", i);
            return;
        }
    }
    gettimeofday(&end, NULL);
    time_cost = 1000000 * (end.tv_sec - start.tv_sec)
            + (end.tv_usec - start.tv_usec);
    printf("B+ tree: success %lu, size %lu, time cost %ld\n", number, size,
            time_cost);
}

We store 2^24 kv in three types: uint32_t, uint64_t, uint128_t and the result is:

     2.1 uint32_t:
        PGM: insert done, size 134218700, time cost 45030241
        PGM: success 16777216, size 134218700, time cost 84936539

        B+ tree: insert done, size 321282456, time cost 11842897
        B+ tree: success 16777216, size 321282456, time cost 8484778

     2.2 uint64_t:
        PGM: insert done, size 268436472, time cost 43116521
        PGM: success 16777216, size 268436472, time cost 81205265

        B+ tree: insert done, size 658504360, time cost 11806067
        B+ tree: success 16777216, size 658504360, time cost 8393439

    2.3 uint128_t:
        PGM: insert done, size 536872016, time cost 41139400
        PGM: success 16777216, size 536872016, time cost 82998800

        B+ tree: insert done, size 1436128368, time cost 16172490
        B+ tree: success 16777216, size 1436128368, time cost 9914490

gvinciguerra commented 2 years ago

Hi,

You are not deallocating the memory you create with new. You should use std::chrono::high_resolution_clock to measure times. You should try different configuration parameters of the DynamicPGM, such as the PGMType template argument and the base constructor argument, to find the most performing configuration for your batch of operations, and the same holds for the configuration parameters of the B+tree.

Here's what I get with the edited code below using gcc-10 and full optimisations on a Xeon Gold 5118 CPU (notice I replaced bpt->get_stats().memory() with 0 because it was undefined, I guess it was your addition to the tlx library).

PGM: insert done, size 134218824, time cost 113
PGM: success 16777216, size 134218824, time cost 49
B+ tree: insert done, size 0, time cost 116
B+ tree: success 16777216, size 0, time cost 90

Finally, given the other issues you have opened in the past, I have to ask you to please use issues here for actual bugs with the code in this repository and not your own code / experiments. Thanks.

#include <iostream>
#include <chrono>
#include <vector>
#include <tlx/container/btree_map.hpp>
#include "pgm/pgm_index_dynamic.hpp"

// pgm
template<typename KeyType, typename ValueType>
void pgm_tests(KeyType *keys, ValueType *values, uint64_t number) {
    typedef pgm::DynamicPGMIndex<KeyType, ValueType, pgm::PGMIndex<KeyType, 32>> PGMFP;
    PGMFP pgm_index(2);
    auto t0 = std::chrono::high_resolution_clock::now();
    for (uint64_t i = 0; i < number; i++) {
        pgm_index.insert_or_assign(keys[i], values[i]);
    }
    auto t1 = std::chrono::high_resolution_clock::now();
    auto time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
    uint64_t size = pgm_index.size_in_bytes();
    printf("PGM: insert done, size %lu, time cost %ld\n", size, time_cost);
    t0 = std::chrono::high_resolution_clock::now();
    for (uint64_t i = 0; i < number; i++) {
        auto iter = pgm_index.find(keys[i]);
        if (iter->second != values[i]) {
            printf("error %lu\n", i);
            return;
        }
    }
    t1 = std::chrono::high_resolution_clock::now();
    time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
    printf("PGM: success %lu, size %lu, time cost %ld\n", number, size,
           time_cost);
}

// btree
template<typename KeyType, typename ValueType>
void bptree_tests(KeyType *keys, ValueType *values, uint64_t number) {
    typedef tlx::btree_map<KeyType, ValueType> bptree;
    bptree bpt;
    auto t0 = std::chrono::high_resolution_clock::now();
    for (uint64_t i = 0; i < number; i++) {
        bpt.insert2(keys[i], values[i]);
    }
    auto t1 = std::chrono::high_resolution_clock::now();
    auto time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
    uint64_t size = 0; // bpt->get_stats().memory(); <- UNDEFINED
    printf("B+ tree: insert done, size %lu, time cost %ld\n", size, time_cost);
    t0 = std::chrono::high_resolution_clock::now();
    for (uint64_t i = 0; i < number; i++) {
        auto iter = bpt.find(keys[i]);
        if (iter->second != values[i]) {
            printf("error %lu\n", i);
            return;
        }
    }
    t1 = std::chrono::high_resolution_clock::now();
    time_cost = std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() / number;
    printf("B+ tree: success %lu, size %lu, time cost %ld\n", number, size,
           time_cost);
}

int main() {
    std::vector<uint32_t> keys(1 << 24);
    for (uint64_t i = 0; i < keys.size(); ++i)
        keys[i] = i;
    decltype(keys) values = keys;

    pgm_tests(keys.data(), values.data(), keys.size());
    bptree_tests(keys.data(), values.data(), keys.size());
    return 0;
}

authwork commented 2 years ago

@gvinciguerra Thanks for your explanations. the function of gettimeofday (us level) and std::chrono::high_resolution_clock (ns level) are the same. It should be preferred to use ns-level for per kv's latency.

The latency results are different because of using different platform (i7 of a notebook). I will try it on other ones.

Feel very sorry to bring inconvenience to you. Maybe it is better to open GitHub discussion for this project? Thanks.

gvinciguerra / PGM-index

[Compared with B+tree] operation latency issue #34