nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.34k stars 641 forks source link

heap-buffer-overflow when sse enabled #107

Open supercoolgreatcoder opened 5 years ago

supercoolgreatcoder commented 5 years ago

Steps to reproduce:

  1. Build with Address Sanitizer enabled (add -fsanitize=address -fno-omit-frame-pointer flags to compiler and linker): https://gist.github.com/supercoolgreatcoder/11b5deccd1f500bd19c84cdf11f2c2a9
  2. run sift1B test: https://gist.github.com/supercoolgreatcoder/764ca0f196bee8a28b629e9e3c56f945 Culprit line: https://github.com/nmslib/hnswlib/blob/d0716d9962ec4ba100a5de551f07c2067e835785/hnswlib/hnswalg.h#L199 Disabling SSE does not trigger heap buffer overflow.
yurymalkov commented 5 years ago

Hi @supercoolgreatcoder, Thank you! What sort of problems it might create? Actually, I thought _mm_prefetch is immune to wrong addresses.

supercoolgreatcoder commented 5 years ago

@yurymalkov, __mm_prefetch is trying to read chunk of memory in heap that might not have been allocated, e.g. int* a = new int[100]; std::cout << a[100]. In the worst case it might end up with a segfault. I believe, it should be fixed relatively easy by fixing number of bytes that has to be prefetched.

fuzesmarcell commented 1 year ago

This issue is still persisting and the issue is caused by de-referencing this address *(datal + j + 1)

The issue is that for datal we already account for the size at the beginning of the buffer here:

size_t size = getListCount((linklistsizeint*)data);
tableint *datal = (tableint *) (data + 1);

So adding the extra 1 in here *(datal + j + 1) is incorrect and on the last iteration we are de-referencing a out of bounds value.

The following fixes the issue and does not cause any heap buffer overflow

_mm_prefetch((char *) (visited_array + *(datal + j)), _MM_HINT_T0);
_mm_prefetch(getDataByInternalId(*(datal + j)), _MM_HINT_T0);