microsoft / ALEX

A library for building an in-memory, Adaptive Learned indEX
MIT License
668 stars 115 forks source link

ALEX crashes in non-batch mode #10

Closed alihadian closed 3 years ago

alihadian commented 3 years ago

I ran ALEX against all of the SOSD benchmark's uint64 datasets (download link). I tried to insert the first 100M records, but ALEX crashed for all datasets.

When I insert the first few thousand records (e.g., 100K records), the code does not crash, but ALEX crashes for a larger number of insertions across all datasets. It's most likely a case that went unnoticed. The error message is as follows:

alex_nodes.h:540: int alex::AlexDataNode<T, P, Compare, Alloc, allow_duplicates>::num_keys_in_range(int, int) const [with T = long unsigned int; P = int; Compare = alex::AlexCompare; Alloc = std::allocator<std::pair<long unsigned int, int> >; bool allow_duplicates = true]: Assertion `left >= 0 && left < right && right <= data_capacity_' failed. Aborted (core dumped)

image

I have plenty of memory (50GB+) so it is certainly not a memory issue.

The code is simple. I just insert the records one by one, up to MAX_INSERTS

#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>

#include "alex.h"

#include "load.hpp"

using namespace std;

#define KEY_TYPE uint64_t
#define PAYLOAD_TYPE int

#define MAX_INSERTS 100000000

template <typename T>
void test_alex(vector<T> &data, string filename)
{
    alex::Alex<KEY_TYPE, PAYLOAD_TYPE> index;

    int cnt = 0;

    for (typename vector<T>::iterator it = data.begin(); it != data.end(); it++)
        if (cnt <= MAX_INSERTS)
            index.insert(*it, ++cnt);

    cout << cnt << endl;
}

int main(int argc, char **argv)
{
    string filename = argv[1];

    vector<uint64_t> data;
    load_bin<uint64_t>(filename, data);
    sort(data.begin(), data.end());
    data.erase(unique(data.begin(), data.end()), data.end());

    test_alex(data, filename);
}
jialinding commented 3 years ago

Thanks for bringing this our attention Ali. I'll look into it.

jialinding commented 3 years ago

The latest commit should fix the issues you're seeing. Let me know if it still doesn't work.

liang2419 commented 3 years ago

Hi Jialin,

Thanks for your updates. We test the insertion of 100M with SOSD datasets (uint64), it works on fb, normal, osm, however, doesn't work for books, lognormal, uniform_dense, uniform_sparse. And the error message is the same as before.

jialinding commented 3 years ago

Thanks for pointing that out. It turns out the assert itself was too restrictive. The latest commit should now work on all uint64 datasets from SOSD.