lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
158 stars 7 forks source link

Add the skiplist #5

Closed NohTow closed 4 months ago

NohTow commented 4 months ago

The original implementation of ColBERT leverage a skiplist to not use certain tokens (mostly punctuation) for representing the text (they are used for the encoding, but are then discarded).

NohTow commented 4 months ago

Solved in #15.