yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.22k stars 112 forks source link

Fixed seeds for deterministic results #144

Open mikepcw opened 1 year ago

mikepcw commented 1 year ago

Somewhat related to #117.

I am also seeing non-deterministic results when parallelising multiple queries across OpenMP threads. But as I saw in #117 this may be a consequence of randomly selecting leaf nodes, and not an issue with threads.

Is there a way to fix the seed so that results are always deterministic? If this is not ideal during production, at least during testing, so results can be debugged more easily.

masajiro commented 1 year ago

It is a little difficult to avoid the randomness of the seeds.

If you request more than 100 vectors as a result, the result is deterministic. This number 100 depends on the maximum number of objects in the leaf node of the tree index. The number is written in this line. If 10 is set to this, even when you request only 10 vectors, the result remains deterministic. However, altering the number might affect the search performance.

Alternatively, if the following line is added in define.h.in, it might be possible to obtain deterministic results. #define NGT_DISABLE_SRAND_FOR_RANDOM