Open kishorenc opened 1 year ago
@yurymalkov
+1 fetches the visited bit for the next item, which can be anywhere in the memory compared the previous element.
Ah, sorry, I missed the extra pointer de-referencing in the code and assumed a sequential lookup.
I benchmarked with/without prefetching by searching on 1M / 4-dimension points.
So, I've now updated the PR to use std::min()
to limit the index of the next prefetch. This is comparable in performance to the current version without bound checking.
Hi @kishorenc, Thanks for the change! We would need to double check the performance
@kishorenc can you please share the benchmarking code?
Also, I wonder, how do you look at the other *(data + 1)
in the code? Those also have the same out-of-bounds prefetch.
Here's the code for the benchmark. I just ran this with/without the changes in the PR and compared.
Also, I wonder, how do you look at the other *(data + 1) in the code? Those also have the same out-of-bounds prefetch.
Yes, those are out-of-bound as well, but I didn't address them because I was not sure what their actual sizes were. In the places that I've changed, there was a clear size
variable available that I could go by.
The prefetching goes past
size
sincedatal
is initialized 1 index pastdata
.I uncovered this issue on ASAN when SSE is enabled. Also spotted earlier here: https://github.com/nmslib/hnswlib/issues/107
cc @dyashuni