Closed mbautin closed 7 months ago
Just to clarify, @mbautin, does it work fine, if you use index_limits_t
to increase the number of threads? If so, that's the intended behavior, but we may want to extend the Multi-Threading code-snippet in the cpp/README.md
to show how to use that.
@ashvardanian unfortunately, simpliy calling reserve
does not seem to be enough. Here is the relevant part of my current test case. I am using 9 indexing threads below but running the test on a 8-vcpu VM.
using namespace unum::usearch;
// Create a metric and index
const size_t kDimensions = 96;
metric_punned_t metric(kDimensions, metric_kind_t::l2sq_k, scalar_kind_t::f32_k);
// Generate and add vectors to the index
const size_t kNumVectors = ReleaseVsDebugVsAsanVsTsan(100000, 20000, 15000, 10000);
const size_t kNumIndexingThreads = 9;
std::uniform_real_distribution<> uniform_distrib(0, 1);
std::string index_path;
{
TestThreadHolder indexing_thread_holder;
index_dense_config_t index_config;
index_config.enable_key_lookups = false;
index_dense_t index = index_dense_t::make(metric, index_config);
index.reserve(index_limits_t(kNumVectors, kNumIndexingThreads));
auto load_start_time = MonoTime::Now();
CountDownLatch latch(kNumIndexingThreads);
std::atomic<size_t> num_vectors_inserted{0};
for (size_t thread_index = 0; thread_index < kNumIndexingThreads; ++thread_index) {
indexing_thread_holder.AddThreadFunctor(
[&num_vectors_inserted, &index, &latch, &uniform_distrib]() {
std::random_device rd;
size_t vector_id;
while ((vector_id = num_vectors_inserted.fetch_add(1)) < kNumVectors) {
auto vec = GenerateRandomVector(kDimensions, uniform_distrib);
ASSERT_TRUE(index.add(vector_id, vec.data()));
}
latch.CountDown();
});
}
latch.Wait();
auto load_elapsed_usec = (MonoTime::Now() - load_start_time).ToMicroseconds();
ReportPerf("Indexed", kNumVectors, "vectors", kDimensions, load_elapsed_usec,
kNumIndexingThreads);
// Save the index to a file
index_path = GetTestDataDirectory() + "/hnsw_index.usearch";
ASSERT_TRUE(index.save(index_path.c_str()));
}
This produces the same ASAN issue as before: https://gist.githubusercontent.com/mbautin/9dc69a931dc28c60093f60a2247b0a99/raw/83a3aad89d5a3b129d02676c6f546370819f6d01/gistfile1.txt
thread_lock_t thread_lock_(std::size_t thread_id) const {
if (thread_id != any_thread())
return {*this, thread_id, false};
available_threads_mutex_.lock();
thread_id = available_threads_.back(); // Crashes here
available_threads_.pop_back();
available_threads_mutex_.unlock();
return {*this, thread_id, true};
}
This is because availablethreads does not take the configuration passed to reserve()
into account:
result.available_threads_.resize(hardware_threads);
Nice catch! I'll ship a patch in a couple of hours π€
The following tentative patch made it work for me: https://gist.githubusercontent.com/mbautin/1cf8ebe5ff01442b4a2431d1cc189a8d/raw
:tada: This issue has been resolved in version 2.11.1 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Describe the bug
When attempting to add vectors to the index using a number of threads that execeeds the hardware concurrency as reported by std::thread::hardware_concurrency(), a crash happens. Stack trace from a test (with AddressSanitizer) is available at https://gist.githubusercontent.com/mbautin/80c0d87a0915e7da7b076a055a382b9e/raw
The root cause seems to be an underflow of the availablethreads vector. It is initialized as follows:
The crash happens in this function:
It is invoked from the add_ function as follows:
The
add
function is invoked withthread = any_thread()
.I see that this whole mechanism is ultimately needed for providing a "context" object corresponding to the thread. The user could enumerate all the threads that will be performing any index insertion or search operations, and pass a 0-based index of that thread as the
thread
parameter. It could be possible to call the reserve() function to set the correct threads_add and threads_search, but it does not seem intuitive.Steps to reproduce
Run a test that performs more simultaneous concurrent indexing operations than the hardware concurrency. Example test: https://gist.githubusercontent.com/mbautin/6ef7cbbc18b818c1ab969be55a74e899/raw (implemented in YugabyteDB's test framework -- should be easy to extract as an independent test).
Expected behavior
The system should not crash. It should allow any number of indexing threads to run.
USearch version
2.10.5
Operating System
AlmaLinux 8.8
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
mbautin@users.noreply.github.com
Is there an existing issue for this?
Code of Conduct