texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
494 stars 94 forks source link

why no index for Dense Retrieval models #62

Closed lboesen closed 1 year ago

lboesen commented 1 year ago

Hi,

I was just wandering why we are not creating indexes for dpr models? and using tevatron.faiss_retriever right after encoding the queries and corpus.

Thansk

MXueguang commented 1 year ago

Hi @lboesen , for dense model, we are using brute-force search by default (FlatIP index of Faiss). The index is built after the corpus embedding loaded while doing search https://github.com/texttron/tevatron/blob/2f6ac044855fbeda3bb9e2092c4172885f3532c2/src/tevatron/faiss_retriever/retriever.py#L16

lboesen commented 1 year ago

okay I see. Is the index saved ?

MXueguang commented 1 year ago

no, currently we only kept the embeddings and load as FaissFlatIP index at search time