jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.13k stars 2.22k forks source link

Bug in hnsw with AnyTensor #5795

Closed samsja closed 1 year ago

samsja commented 1 year ago

Context

Predefined document embedding field are not recognize by our indexer

from docarray import DocList
from docarray.documents import ImageDoc
from docarray.index import HnswDocumentIndex
import numpy as np

# create some data
dl = DocList[ImageDoc](
    [
        ImageDoc(
            url="https://upload.wikimedia.org/wikipedia/commons/2/2f/Alpamayo.jpg",
            tensor=np.zeros((3, 224, 224)),
            embedding=np.random.random((128,)),
        )
        for _ in range(100)
    ]
)

# create a Document Index
index = HnswDocumentIndex[ImageDoc](work_dir='/tmp/test_index')

# index your data
index.index(dl)

# find similar Documents
query = dl[0]
results, scores = index.find(query, limit=10, search_field='embedding')
  File "/home/sami/Documents/workspace/Jina/docarray2/docarray/docarray/index/backends/hnswlib.py", line 262, in _find
    docs, scores = self._find_batched(
  File "/home/sami/Documents/workspace/Jina/docarray2/docarray/docarray/index/backends/hnswlib.py", line 248, in _find_batched
    index = self._hnsw_indices[search_field]
KeyError: 'embedding'

Process finished with exit code 1
samsja commented 1 year ago

wrong repo