microsoft / ALEX

A library for building an in-memory, Adaptive Learned indEX
MIT License
668 stars 115 forks source link

Unable to erase previously inserted key #16

Closed geoffxy closed 3 years ago

geoffxy commented 3 years ago

I've encountered some strange behavior where I cannot erase a key that was previously inserted into ALEX.

I've put together a self-contained program that reproduces the problem along with a trace. The program can be compiled by placing it under src in this repository (and adding an add_executable() to the CMakeLists.txt file). To reproduce, just run the compiled program with the provided trace.

The program and trace will bulk load ALEX with 9000 64-bit keys and then run a series of inserts and deletes. The delete of the last key in the trace (347892347588) appears to fail because erase() returns 0. However, the key should exist in ALEX because it was inserted earlier (line 16694 in the trace) and was not deleted afterwards.

I was able to reproduce the problem in both Debug and Release mode. I tried compiling on machines with g++ v9.3.0 and v11.1.0 and was able to reproduce this problem with both compiler versions.

Trace: alex_remove_not_found.log

Program:

#include <cstdint>
#include <fstream>
#include <iostream>
#include <string>
#include <utility>
#include <vector>

#include "core/alex.h"

namespace {

int LoadAndReplay(const std::string& path) {
  alex::Alex<uint64_t, uint64_t> index;
  std::vector<std::pair<uint64_t, uint64_t>> entries;

  std::ifstream trace(path);
  uint64_t lineno = 0;
  std::string type;
  uint64_t key;

  bool load_phase = true;

  while (trace >> type >> key) {
    lineno++;

    if (type == "L") {
      entries.emplace_back(key, lineno);
      continue;
    }

    // Done reading keys that should be bulk loaded.
    if (load_phase) {
      index.bulk_load(entries.data(), entries.size());

      // Validate the bulk load.
      size_t i = 0;
      for (const auto& rec : index) {
        if (rec.first != entries[i].first) {
          std::cerr << "After the bulk load, ALEX is missing the key ("
                    << entries[i].first << ")." << std::endl;
          return 3;
        }
        ++i;
      }

      load_phase = false;
    }

    if (type == "I") {
      const auto res = index.insert(key, lineno);
      if (!res.second) {
        std::cerr << "Failed to insert key (" << key << "). Line " << lineno
                  << std::endl;
        return 2;
      }
    }
    if (type == "R") {
      const auto num_erased = index.erase(key);
      if (num_erased == 0) {
        std::cerr << "Failed to erase key that should exist (" << key
                  << "). Line " << lineno << std::endl;
        return 1;
      }
    }
  }
  std::cerr << "Trace replayed successfully." << std::endl;
  return 0;
}

}  // namespace

int main(int argc, char* argv[]) {
  if (argc != 2) {
    std::cerr << "Usage: " << argv[0] << " path/to/trace.log" << std::endl;
    return 1;
  }
  return LoadAndReplay(argv[1]);
}

Expected output:

Trace replayed successfully.

Actual output:

Failed to erase key that should exist (347892347588). Line 36813
jialinding commented 3 years ago

Should be fixed now. Let me know if you still see issues.

geoffxy commented 3 years ago

Thanks @jialinding!