Closed claudiosv closed 5 years ago
Hi @claudiosv .
I used for Java-med:
SUBTOKENS_VOCAB_MAX_SIZE = 184379
TARGET_VOCAB_MAX_SIZE = 10903
The reason that the numbers are not round is that in the original implementation I limited the vocabulary by taking only tokens/targets that appear at least X times. This lead to these vocab sizes. In the open source version, I changed the vocab to be limited by its max size, taking the mostly occurring tokens/targets.
Let me know if you have any more questions.
Hi @urialon, thanks for the details! Very much appreciated.
Hi,
Thanks for this great work. I'm trying to reproduce the results from the paper for java-med, and I was wondering what values for config.SUBTOKENS_VOCAB_MAX_SIZE and config.TARGET_VOCAB_MAX_SIZE were used? I couldn't find it in the paper or in any existing issue.
Thank you in advance.
Best, Claudio