There's a problem about the vocabulary of https://s3.amazonaws.com/opennmt-models/iwslt.pt : After loading the model by model = torch.load("iwslt.pt"), it can be found that size of the English vocabulary is 36321
(iwslt.pt : size 36321; built from datasets.IWSLT: 36327)
The codes for building vocab are almost identical. Perhaps datasets.IWSLT has changed slightly so that the vocab differs slightly.
Though translation results of the model on the 'valid_iter' seem quite correct, the model loaded from 'iwslt.pt' still cannot fully work since the vocabulary currently built from datasets.IWSLT does not match the vocabulary size.
I am building a model based on the iwslt.pt so I need the English vocabulary. How or where to obtain the correct English vocabulary of 'https://s3.amazonaws.com/opennmt-models/iwslt.pt' (size 36321) ?
There's a problem about the vocabulary of https://s3.amazonaws.com/opennmt-models/iwslt.pt : After loading the model by
model = torch.load("iwslt.pt")
, it can be found that size of the English vocabulary is 36321(iwslt.pt : size 36321; built from datasets.IWSLT: 36327)
However, after building the TGT.vocab by
, it's found that the size of the English vocabulary is
The codes for building vocab are almost identical. Perhaps datasets.IWSLT has changed slightly so that the vocab differs slightly.
Though translation results of the model on the 'valid_iter' seem quite correct, the model loaded from 'iwslt.pt' still cannot fully work since the vocabulary currently built from datasets.IWSLT does not match the vocabulary size.
I am building a model based on the iwslt.pt so I need the English vocabulary. How or where to obtain the correct English vocabulary of 'https://s3.amazonaws.com/opennmt-models/iwslt.pt' (size 36321) ?