Add threshold alternative to the pooling method

lightonai / pylate

Late Interaction Models Training & Retrieval

https://lightonai.github.io/pylate/

MIT License

175 stars 7 forks source link

Add threshold alternative to the pooling method #33

Open NohTow opened 3 months ago

NohTow commented 3 months ago

Right now, the pooling method pools the embedding tokens to a fixed ratio. We should allow users to set a threshold on the cosine similarity up to which the tokens can be merged. This should allow us to adapt more to the data by compressing more sequences with a lot of redundant information and fewer the ones that do not.