nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.31k stars 633 forks source link

Fix heap buffer overflow caused by prefetch. #459

Open kishorenc opened 1 year ago

kishorenc commented 1 year ago

The prefetching goes past size since datal is initialized 1 index past data.

I uncovered this issue on ASAN when SSE is enabled. Also spotted earlier here: https://github.com/nmslib/hnswlib/issues/107

cc @dyashuni

kishorenc commented 1 year ago

@yurymalkov

+1 fetches the visited bit for the next item, which can be anywhere in the memory compared the previous element.

Ah, sorry, I missed the extra pointer de-referencing in the code and assumed a sequential lookup.

I benchmarked with/without prefetching by searching on 1M / 4-dimension points.

So, I've now updated the PR to use std::min() to limit the index of the next prefetch. This is comparable in performance to the current version without bound checking.

yurymalkov commented 1 year ago

Hi @kishorenc, Thanks for the change! We would need to double check the performance

yurymalkov commented 1 year ago

@kishorenc can you please share the benchmarking code? Also, I wonder, how do you look at the other *(data + 1) in the code? Those also have the same out-of-bounds prefetch.

kishorenc commented 1 year ago

Here's the code for the benchmark. I just ran this with/without the changes in the PR and compared.

Also, I wonder, how do you look at the other *(data + 1) in the code? Those also have the same out-of-bounds prefetch.

Yes, those are out-of-bound as well, but I didn't address them because I was not sure what their actual sizes were. In the places that I've changed, there was a clear size variable available that I could go by.