spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Corrupted or unsupported index after saving. #40

Closed janfait closed 9 months ago

janfait commented 9 months ago

Hello, stuck with the below. Would appreciate any tips.

My vectors look like this:

[[7.91172300e-01 6.69090297e-01 2.91000000e+02]
 [6.11795087e-01 3.69995315e-01 8.11000000e+02]
 [6.12826115e-01 3.79121037e-01 6.68000000e+02]
 [4.94505465e-01 3.66105550e-01 1.79000000e+02]
 [8.57812207e-01 3.69706741e-01 2.87000000e+02]
 [4.87957676e-01 3.83922704e-01 1.90000000e+02]
 [5.79707092e-01 5.88521933e-01 8.22000000e+02]
 [8.77284651e-01 3.60034340e-01 3.27000000e+02]
 [6.96175913e-01 4.77069307e-01 2.67000000e+02]
 [8.37530029e-01 6.95131995e-01 7.31000000e+02]]

Building and saving my index with this process works nicely.

    df = pd.read_csv(input_csv)
    vectors = df[['Size', 'Gps', 'CategoryCluster']].values
    ids = df['Id'].tolist()
    index = Index(Space.Euclidean, num_dimensions=vectors.shape[1])

    index.add_items(vectors,ids)

    #test that the index works
    queries = index.get_vectors([884])
    neighbors, distances = index.query(queries, k=5)
    print(neighbors)
    print(distances)

    index.save(index_path)

The below data is returned from prints. All good.

[[ 884 556793 524883 662437 529508]] [[0. 0.0011078 0.00121032 0.00268939 0.00401055]]

When trying to read the index for later use with:

index = Index.load(index_path)

I get: RuntimeError: Index seems to be corrupted or unsupported. Advancing to the next linked list requires 13312 additional bytes (from position 129997), but index data only has 130147 bytes in total. It is not clear to me where to start with debugging. Do you have any tips on what could be wrong here?

I am on Windows 10 Pro Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 2301 MHz Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

janfait commented 9 months ago

I was able to get it running in Docker so I assume it was related to my operating system. Closing

naediros commented 9 months ago

For anyone struggling here as @janfait: try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 
han1399013493 commented 5 months ago

For anyone struggling here as @janfait: try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 

very thanks