Closed aalloul closed 3 years ago
Hi @aalloul,
max_tokens
and max_sentences
are used to make batches, but are not used to truncate the input. These parameters can be used to adjust the computing performance / memory.
If you're wondering if there's a length limit for a sentence, please refer to: https://github.com/facebookresearch/LASER/issues/137#issuecomment-606764408.
I'm closing the issue, please feel free to re-open if needed.
Hello and thank you for this library!
I have a question regarding how different sentence lengths are treated. Here's a code I ran:
Then I computed the cosine similarity between the last embedding (i.e.
out[-1]
) and the other ones and the result is in the plot below.As you can see, one can't differentiate the results from the 2 LASER instances (
laser
andlaser_extended
). Is this expected? I also get the very same result withmax_tokens = 200
. I would've expected that the result doesn't change when the number of tokens exceeds this parameter.