yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.22k stars 112 forks source link

RuntimeError: remove: cannot remove from tree. get: Not in-memory or invalid offset of node. #130

Closed wayerr closed 1 year ago

wayerr commented 1 year ago

I doing following steps:

Then sometime I have got following error:

RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1519: remove:: cannot remove from tree. id=4923 /NGT/release/NGT/lib/NGT/Common.h:get:1943: get: Not in-memory or invalid offset of node. idx=1826234676 size=126

Error is happened not each run. Sometimes it may not appeared 10-15 times, but sometimes it is happened at each run. Also, it depends from vector type - random vector cause error on multiple last elements, but numpy.full(dims, i) has error only for one last element.

Environment: Libs: ngt==2.0.4 numpy==1.23.5 pybind11==2.10.1 Python: Python 3.10.8 Os: Linux 6.0.0-5-amd64 1 SMP PREEMPT_DYNAMIC Debian 6.0.10-1 (2022-11-26) x86_64 GNU/Linux (Tested on two computers with same linux and different hardware)

Example code for reproduce:

#!/usr/bin/python3
import logging
import numpy
import ngtpy

path = "/tmp/test_ngt"
dims=10

logging.root.setLevel(logging.INFO)

ngtpy.create(path, dims)
idx = ngtpy.Index(path)

# this is cause error only for one last element
# make_vector = lambda i : numpy.full(dims, i)

# this is cause error for more last elements
make_vector = lambda i : numpy.random.rand(dims)

vectors = [make_vector(i) for i in range(0, 5000)]

inserted = []

for v in vectors:
    new_id = idx.insert(v)
    inserted.append(new_id)

idx.build_index(1)
idx.save()
logging.info("Index saved, try to rewrite each elements.")

for i, v in enumerate(vectors):
    old_id = inserted[i]
    try:
        idx.remove(old_id)
    except RuntimeError as e:
        logging.exception("Can not remove old_id=%s at i=%s", old_id, i)
    idx.insert(v)

Output:

 % python3 test.py
INFO:root:Index saved, try to rewrite each elements.
ERROR:root:Can not remove old_id=4907 at i=4907
Traceback (most recent call last):
  File "/tmp/test_ngt_dir/test.py", line 35, in <module>
    idx.remove(old_id)
RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1519: remove:: cannot remove from tree. id=4908 /NGT/release/NGT/lib/NGT/Common.h:get:1943: get: Not in-memory or invalid offset of node. idx=1550698841 size=126
ERROR:root:Can not remove old_id=4910 at i=4910
Traceback (most recent call last):
  File "/tmp/test_ngt_dir/test.py", line 35, in <module>
    idx.remove(old_id)
RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1519: remove:: cannot remove from tree. id=4911 /NGT/release/NGT/lib/NGT/Common.h:get:1943: get: Not in-memory or invalid offset of node. idx=1549090187 size=126
masajiro commented 1 year ago

So far, I have not been able to reproduce that error.

RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1519: remove:: cannot remove from tree. id=4908 /NGT/release/NGT/lib/NGT/Common.h:get:1943: get: Not in-memory or invalid offset of node. idx=1550698841 size=126
ERROR:root:Can not remove old_id=4910 at i=4910

However. the error message above is quite strange. It shows that the ID of the tree node is too large and invalid. There might be a possibility that ngtpy loads a different version of the ngt library.

wayerr commented 1 year ago

There might be a possibility that ngtpy loads a different version of the ngt library.

How I can check this?

wayerr commented 1 year ago

So, I have tested this in Docker with "Debian stable" & "Debian testing" (with python 11 and manually build ngt) and can not reproduce this issue.

But it still reproducible at my local PC. Looks like it caused by specific versions of system libraries or something else.

Anyway I using this in Docker, and think that this issue can be closed.