lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
175 stars 7 forks source link

Add threshold alternative to the pooling method #33

Open NohTow opened 3 months ago

NohTow commented 3 months ago

Right now, the pooling method pools the embedding tokens to a fixed ratio. We should allow users to set a threshold on the cosine similarity up to which the tokens can be merged. This should allow us to adapt more to the data by compressing more sequences with a lot of redundant information and fewer the ones that do not.