Closed ArtanisTheOne closed 2 years ago
Hello!
It seems that you are using the vocab file created by SentencePiece. You should rather use the vocab file created by OpenNMT onmt_build_vocab
Alternatively, you can use the script spm_to_vocab.py to convert a SentencePiece vocab file to OpenNMT-py compilable format. If you rather use OpenNMT-tf, there is a command for this vocab conversion process.
I hope this helps.
Kind regards, Yasmin
I've followed all instructions with a corpus size of around 300,000 (vocab 25,000) and keep on running into this issues (have tried multiple times, same problem). I've completed all pre-processing, model training etc successfully but the library just errors upon a specific entry in the source.vocab (below)
Do you have any idea how I can resolve my issue?