fix: fix masking to consider mask_token

Hey team!

Thanks for the great project, I have found an issue when encoding documents in different batches and then scoring them independently against different queries.

The mask method used inside the doc method of colBert does not consider mask tokens. (it could use mask tokens or take into account the attention_mask passed to the doc method.

Masking method in ColBert only handles skip and pad_tokens but does not handle mask_tokens. This leads to different relevance scores if the encoding has been done in batches together with other documents or not.

stanford-futuredata / ColBERT

fix: fix masking to consider mask_token #314