texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
531 stars 100 forks source link

evaluating sparse retrievers splade++ (main branch) #149

Open srikanthmalla opened 2 months ago

srikanthmalla commented 2 months ago

Hi @MXueguang ,

I am currently evaluating a sparse retriever, splade++ (cocondensor-ensembledistil) on arguana using below command:

./eval_beir.sh --dataset arguana 
              --tokenizer naver/splade-cocondenser-ensembledistil  
              --model_name_path naver/splade-cocondenser-ensembledistil  
              --embedding_dir beir_embedding_arguana_splade 
              --query_prefix ""  
              --passage_prefix "" 
              --normalize

ndcg@10 is self reported as 0.518 in their paper (Table 4), and no other metrics like mrr, map, recall in paper or in huggingface page

From the above script in the repo, I got 0.1730 . Am I passing wrong arguments, or the current embedding, and retrieving is not supported for sparse vector embeddings like splade? Please let me know.

Thank you, Srikanth

MXueguang commented 2 months ago

Hi @srikanthmalla , splade use the max pooling of MLM layer to get the sparse vectors. directly use the script to get dense vector not works for splade. please see https://github.com/texttron/tevatron/tree/main/examples/splade regarding how splade encode text and build inverted index for search.