jina-ai / executor-faissindexer

A similarity search indexer based on Faiss. https://hub.jina.ai/executor/8gsd0tts
4 stars 1 forks source link

what's the meaning of indexer.is_trained? #12

Open TITC opened 3 years ago

TITC commented 3 years ago

I am trying to go deeper in this repo but disturbing in this part

        if not self._vec_indexer.is_trained:
            self.logger.warning(f'The indexer need to be trained!')
            return

If I use lmdb as storage backend, _vec_indexer will be load through lmdb when I trigger /search interface.

        if indexer is None:
            indexer = self._init_indexer(
                embeddings.shape[1],
                index_key=self.index_key,
                metric_type=self.metric_type,
                **self._index_kwargs,
            )

and when _init_indexer finished, the attribute is_trained is attached to indexer and value is true.

I reckon there may have some process in index process with faiss, but what I found is self._kv_db.put(docs). It's directly call lmbd put method to save docs without any pre-process about faiss.

Could you give any clue about this part?

numb3r3 commented 3 years ago

Thanks for your interest. I'm so sorry for the late response. the pre-process about faiss are involved in this function _add_vecs_with_ids

TITC commented 3 years ago

Thanks for your response, but _add_vecs_with_ids is called after self._kv_db.put(docs), does it not a post-process for docs in memory syncronize?

image

I notice there is an update operation after _add_vecs_with_ids, however that operation only be called when if len(exist_docs) > 0:. According to the rule, the latest is the priority, then update it. image

Thanks for your interest. I'm so sorry for the late response. the pre-process about faiss are involved in this function _add_vecs_with_ids

But in the first time, lmdb hasn't any data, so no update operation execute after _add_vecs_with_ids, could you give any further interpretation? @numb3r3