Tokens in `skiplist` are not returned (masked out) but they still affect other tokens embeddings. Is this expected?

JoanFM commented 3 months ago

In the forward pass implementation we have:

        input_ids, attention_mask = input_ids.to(self.device), attention_mask.to(self.device)
        D = self.bert(input_ids, attention_mask=attention_mask)[0]
        D = self.linear(D)
        mask = torch.tensor(self.mask(input_ids, skiplist=self.skiplist), device=self.device).unsqueeze(2).float()
        D = D * mask

This means that the token embeddings for skiplist and pad_token_id are zeroed in the output, however their token will be considered in the bert implementation.

Is this expected? Should attention_mask also mask out those values in the same way as input_ids?

okhat commented 3 months ago

Yes, expected behavior. We still want BERT to see them, we just don't need embeddings for them.

JoanFM commented 3 months ago

Thanks a lot @okhat for the response

stanford-futuredata / ColBERT

Tokens in `skiplist` are not returned (masked out) but they still affect other tokens embeddings. Is this expected? #320