Closed min-guk closed 1 year ago
From my personal experience running multiple benchmarks, it seems that issues arise when the dataset is too easy or when the maximum error is extremely large. Problems seem to occur due to floating point operations when a single segment handles a large number of keys and the slope value decreases.
I also referred to https://github.com/gvinciguerra/PGM-index/issues/30, and it appears that using fesetround(FE_DOWNWARD);
leads to cases where keys cannot be found at different epsilon values.
Is there a way to prevent this issue? Thank you.
There is just one segment out of 145 in the pgm::PGMIndex<uint64_t, 512, 4, float>
for that eth dataset whose slope requires more precision than float. A quick fix here would be to change the 4th template argument to double, or simply to keep track of the increased error (just +2)
In this case, it would be good to solve the problem by using the double data type. I really appreciate your kind responses to my questions every time. :smile:
In this case, it would be good to solve the problem by using the double data type.
Yes, it's the simplest option since it seems to happen just in this particular dataset + epsilon + segment. (A more involved solution would be to split/stop the segment when it needs more precision than the given floating point type.)
I really appreciate your kind responses to my questions every time. 😄
Sure!
Thank you for resolving the issue I reported previously. It seems like most of the issues I brought up last time have been resolved with the last commit. (https://github.com/gvinciguerra/PGM-index/issues/44)
So, cases where it can't find the key are occurring much less frequently. However, there are still a few cases where it seems unable to find the key.
Could you check the test code below? The codes below are mostly the same as the ones I uploaded on the issue previously, and I've modified the 4th test code to search for all keys in the dataset.
Thanks!
Download PGM-Index and eth dataset.
Modify the code to build the PGM-Index in a single thread.
Change the test code
include
include
include
include
include
include
include
include
include
include
include
include
include
// Loads values from binary file into vector. static std::vector load_data(const std::string& filename,
bool print = true) {
std::vector data;
}
template <typename Index, typename Data> void test_index(const Index &index, const Data &data) { auto rand = std::bind(std::uniform_int_distribution(0, data.size() - 1), std::mt19937{42});
}
TEMPLATE_TEST_CASE_SIG("PGM-index", "", ((typename T, size_t E1, size_t E2), T, E1, E2), (uint64_t, 512, 4)) { auto data = load_data("./eth"); pgm::PGMIndex<T, E1, E2> index(data.begin(), data.end()); test_index(index, data); }
cmake . -DCMAKE_BUILD_TYPE=Release make -j8 ./test/tests