This PR corrects the behavior of convert_to_numpy and convert_to_tensor parameters of the encode function, by returning either a list of numpy arrays or a list of tensors (as we cannot stack everything, since documents might not have the same length).
I also adjusted the different part of the code relying on the encode function and it does not seems to brings regression.
Also added the padding option parameters, but I am still unsure about it has we create a big tensor to then split it into a list, when it will certainly be used as a tensor in the end, so the overhead is a bit painful.
@raphaelsty if you could please have a look and tell me what you think about this.
Did the change for the convert_to_tensor function, delegating the cleaning variable name to latter when we will do a big cleaning pass to avoid having to do to many regression tests.
This PR corrects the behavior of
convert_to_numpy
andconvert_to_tensor
parameters of the encode function, by returning either a list of numpy arrays or a list of tensors (as we cannot stack everything, since documents might not have the same length).I also adjusted the different part of the code relying on the encode function and it does not seems to brings regression. Also added the padding option parameters, but I am still unsure about it has we create a big tensor to then split it into a list, when it will certainly be used as a tensor in the end, so the overhead is a bit painful.
@raphaelsty if you could please have a look and tell me what you think about this.