Equation (1) and (4) - Githubissues

hguan6 commented 2 years ago

In your paper, you said equation (1) is equivalent to the MLM prediction and E_j in equation (1) denotes the BERT input embedding for token j. If you use the default implementation of HuggingFace Transformers, E_j is not from the input layer but another embeddings matrix, which is called "decoder" in the "BertLMPredictionHead" (if you use BERT). Did you manually set the "decoder" weights to the input embedding weights?

My other question is concerning equation (4). It computes the summation of the weights of the document/query terms. In the "forward" function of the Splade class (models.py) however, you use "torch.max" function. Can you explain this issue?

thibault-formal commented 2 years ago

Hi,

I think by default, weights between input embeddings and decoder layer are shared, see: https://github.com/huggingface/transformers/issues/1460#issuecomment-847105800
Yes, sorry for the confusion. Initially, we used sum for the pooling mechanism, but it turns out that max works better (see our new arxiv on v2 version of SPLADE: https://arxiv.org/abs/2109.10086). I updated the repo, so it should be clearer now (see e.g. inference_SPLADE.ipynb).

Thibault

hguan6 commented 2 years ago

Thank you for the clarification! It makes sense to me now.

naver / splade

Equation (1) and (4) #4