lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
158 stars 7 forks source link

convert_to parameters #18

Closed NohTow closed 4 months ago

NohTow commented 4 months ago

This PR corrects the behavior of convert_to_numpy and convert_to_tensor parameters of the encode function, by returning either a list of numpy arrays or a list of tensors (as we cannot stack everything, since documents might not have the same length).

I also adjusted the different part of the code relying on the encode function and it does not seems to brings regression. Also added the padding option parameters, but I am still unsure about it has we create a big tensor to then split it into a list, when it will certainly be used as a tensor in the end, so the overhead is a bit painful.

@raphaelsty if you could please have a look and tell me what you think about this.

NohTow commented 4 months ago

Did the change for the convert_to_tensor function, delegating the cleaning variable name to latter when we will do a big cleaning pass to avoid having to do to many regression tests.