naver / splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)
Other
737 stars 80 forks source link

Equation (1) and (4) #4

Closed hguan6 closed 2 years ago

hguan6 commented 2 years ago

In your paper, you said equation (1) is equivalent to the MLM prediction and E_j in equation (1) denotes the BERT input embedding for token j. If you use the default implementation of HuggingFace Transformers, E_j is not from the input layer but another embeddings matrix, which is called "decoder" in the "BertLMPredictionHead" (if you use BERT). Did you manually set the "decoder" weights to the input embedding weights?

My other question is concerning equation (4). It computes the summation of the weights of the document/query terms. In the "forward" function of the Splade class (models.py) however, you use "torch.max" function. Can you explain this issue?

thibault-formal commented 2 years ago

Hi,

Thibault

hguan6 commented 2 years ago

Thank you for the clarification! It makes sense to me now.