Open siddhsql opened 1 year ago
In C++ the pointers can be casted between different types. char *
can be casted to int *
.
char **
because it is array of pointers to arrays of different sizes (elements have different number of levels, so different array sizes)
char
is because char is a byte, the minimal usable quant to control the memory.
size_links_per_element_ * curlevel
is the size of link list of the upper levels of an element (size_links_per_element_
is size per level). +1
I do not remember, I think that was a homeophatic counter to memory prefetches getting slightly out of bound during search.
(level - 1) * size_links_per_element_
computes the portion of the memory where links reside. -1
is because the links for level 0 are stored in a different array, while this memory has only links from 1 to level.
thanks Yury.
On Tue, Aug 22, 2023 at 9:26 PM Yury Malkov @.***> wrote:
In C++ the pointers can be casted between different types. char can be casted to int . char ** because it is array of pointers to arrays of different sizes (elements have different number of levels, so different array sizes) char is because char is a byte, the minimal usable quant to control the memory.
size_links_perelement * curlevel is the size of link list of the upper levels of an element (size_links_perelement is size per level). +1 I do not remember, I think that was a homeophatic counter to memory prefetches getting slightly out of bound during search.
(level - 1) * size_links_perelement computes the portion of the memory where links reside. -1 is because the links for level 0 are stored in a different array, while this memory has only links from 1 to level.
— Reply to this email directly, view it on GitHub https://github.com/nmslib/hnswlib/issues/499#issuecomment-1689261805, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEK2SVRF7L4TIEHCCRCDXWWA77ANCNFSM6AAAAAA3WCHYX4 . You are receiving this because you authored the thread.Message ID: @.***>
one more question if I may. Looking at the code you have SIMD optimized code but when the code is compiled the flags necessary to take advantage of SIMD are never enabled - is that correct? because I don't see anywhere turning on the SIMD flags. is the SIMD code still experimental that's why you don't enable the compiler flags?
On Wed, Aug 23, 2023 at 8:43 AM Siddharth Jain @.***> wrote:
thanks Yury.
On Tue, Aug 22, 2023 at 9:26 PM Yury Malkov @.***> wrote:
In C++ the pointers can be casted between different types. char can be casted to int . char ** because it is array of pointers to arrays of different sizes (elements have different number of levels, so different array sizes) char is because char is a byte, the minimal usable quant to control the memory.
size_links_perelement * curlevel is the size of link list of the upper levels of an element (size_links_perelement is size per level). +1 I do not remember, I think that was a homeophatic counter to memory prefetches getting slightly out of bound during search.
(level - 1) * size_links_perelement computes the portion of the memory where links reside. -1 is because the links for level 0 are stored in a different array, while this memory has only links from 1 to level.
— Reply to this email directly, view it on GitHub https://github.com/nmslib/hnswlib/issues/499#issuecomment-1689261805, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEK2SVRF7L4TIEHCCRCDXWWA77ANCNFSM6AAAAAA3WCHYX4 . You are receiving this because you authored the thread.Message ID: @.***>
more importantly looking at this code:
// finding the "weakest" element to replace it with the new one
dist_t d_max = fstdistfunc_(getDataByInternalId(cur_c), getDataByInternalId(
selectedNeighbors[idx]),
dist_func_param_);
// Heuristic:
std::priority_queue<std::pair<dist_t, tableint>, std::vector<std::pair<
dist_t, tableint>>, CompareByFirst> candidates;
candidates.emplace(d_max, cur_c);
for (size_t j = 0; j < sz_link_list_other; j++) {
candidates.emplace(
fstdistfunc_(getDataByInternalId(data[j]), getDataByInternalId(
selectedNeighbors[idx]),
dist_func_param_), data[j]);
}
getNeighborsByHeuristic2(candidates, Mcurmax);
int indx = 0;
while (candidates.size() > 0) {
data[indx] = candidates.top().second;
candidates.pop();
indx++;
}
setListCount(ll_other, indx);
This is lines 14-16 in algorithm 1 in your famous paper. IIUC it will delete the connection from neighborhood(e) to the weakest element but since this is a bidirected graph don't you also need to delete the connection from the weakest element to e? i don't see that happening in above code.
On Wed, Aug 23, 2023 at 8:46 AM Siddharth Jain @.***> wrote:
one more question if I may. Looking at the code you have SIMD optimized code but when the code is compiled the flags necessary to take advantage of SIMD are never enabled - is that correct? because I don't see anywhere turning on the SIMD flags. is the SIMD code still experimental that's why you don't enable the compiler flags?
On Wed, Aug 23, 2023 at 8:43 AM Siddharth Jain @.***> wrote:
thanks Yury.
On Tue, Aug 22, 2023 at 9:26 PM Yury Malkov @.***> wrote:
In C++ the pointers can be casted between different types. char can be casted to int . char ** because it is array of pointers to arrays of different sizes (elements have different number of levels, so different array sizes) char is because char is a byte, the minimal usable quant to control the memory.
size_links_perelement * curlevel is the size of link list of the upper levels of an element (size_links_perelement is size per level). +1 I do not remember, I think that was a homeophatic counter to memory prefetches getting slightly out of bound during search.
(level - 1) * size_links_perelement computes the portion of the memory where links reside. -1 is because the links for level 0 are stored in a different array, while this memory has only links from 1 to level.
— Reply to this email directly, view it on GitHub https://github.com/nmslib/hnswlib/issues/499#issuecomment-1689261805, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEK2SVRF7L4TIEHCCRCDXWWA77ANCNFSM6AAAAAA3WCHYX4 . You are receiving this because you authored the thread.Message ID: @.***>
SIMD are never enabled - is that correct?
The flags are set by the compiler at compile time https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile
this is a bidirected graph don't you also need to delete the connection from the weakest element to e? i don't see that happening in above code.
The graph links are not always bidirectional in HNSW, they are created by bidirectionally connecting elements, but may not stay this way. The pruning might be removed, that would make it always bidirectional, but it will lead to high-degree tail.
thanks for that link about AVX flags. I got similar output. but when I look at the code, the flags it uses are USE_AVX and USE_AVX512. and it looks like these flags are not set by:
-Ofast -lrt -DNDEBUG -std=c++20 -DHAVE_CXX0X -march=native -fpic -w -fopenmp -ftree-vectorize -ftree-vectorizer-verbose=0"
On Wed, Aug 23, 2023 at 10:10 PM Yury Malkov @.***> wrote:
SIMD are never enabled - is that correct?
The flags are set by the compiler at compile time https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile
this is a bidirected graph don't you also need to delete the connection from the weakest element to e? i don't see that happening in above code.
The graph links are not always bidirectional in HNSW, they are created by bidirectionally connecting elements, but may not stay this way. The pruning might be removed, that would make it always bidirectional, but it will lead to high-degree tail.
— Reply to this email directly, view it on GitHub https://github.com/nmslib/hnswlib/issues/499#issuecomment-1691011489, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEK6JWYIGBO2644RJUDLXW3O3LANCNFSM6AAAAAA3WCHYX4 . You are receiving this because you authored the thread.Message ID: @.***>
Hi Yury,
I am trying to understand hnswlib code. w.r.t. this:
I understand this is storing adjacency list for l > 0.
Why is this a
char**
instead ofuint**
? I would expect an element id to be a 4 byte (unsigned) int.char
will only be able to store 256 values. Sorry I am not a c++ developer so bear with me.Over here:
Could you explain me what is the purpose of
(level - 1) * size_links_per_element_
?and over here:
I see that:
so this stores the # of bytes for the adjacency list. why are you multiplying this by
curlevel
? and adding 1 to it?Is there any README that explains the data layout of hnswlib? thanks.