naver / splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)
Other
737 stars 80 forks source link

FLOPs calculation #12

Closed namespace-Pt closed 2 years ago

namespace-Pt commented 2 years ago

I recently read your SPLADE paper and I think it's quite interesting. I have a question concerning FLOPs calculation in the paper.

I think computing FLOPs for an inverted index involves the length of the activated posting lists(the overlapping terms in query and document). For example, a query a b c and a document c a e, since we must inspect the posting list of the overlapping terms a and c, the flops should at least be

posting_length(a) + posting_length(c)

because we perform summation for each entry in the posting list. However, in the paper you compute FLOPs by the probability that a, b, c are activated in the query and c, a, e are activated in the document. I think this may underestimate the flops of SPLADE because the less sparse the document, the longer posting lists in the inverted index.

thibault-formal commented 2 years ago

hi @namespace-Pt Sorry for the late answer!

I am not sure I completely got your point. For the FLOPS estimation, we rely on the derivation from Minimizing FLOPs to Learn Efficient Sparse Representations. The probabilities are directly estimated from the length of the posting lists (for both documents and queries, where for the later we simply "index" them).

let us know if you need more details, Thibault

namespace-Pt commented 2 years ago

Thank you @thibault-formal! I'll read the paper.

BTW, do you have a checkpoint of SPLADEv2-distill? I tried to reproduce your result but failed. I found that using the distillation we donot need a ground-truth passage to each query, so did you use the queries that are absent from qrels.train.tsv to train the student model?

namespace-Pt commented 2 years ago

I would appreciate it if you could also provide the SPLADEv2-max (MRR@10=0.34) checkpoint. I want to see how the model learns to distribute the tokens.

sclincha commented 2 years ago

Hi @namespace-Pt ,

thibault-formal commented 2 years ago

@sclincha @namespace-Pt the weights for SPLADEv2-max can actually be found in the weights folder in this repo.

namespace-Pt commented 2 years ago

@sclincha @thibault-formal Thank you! I'll check it out.