ValueError: invalid literal for int() with base 10: '-2.34575'

ymoslem / OpenNMT-Tutorial

Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment.

MIT License

153 stars 30 forks source link

ValueError: invalid literal for int() with base 10: '-2.34575' #4

Closed ArtanisTheOne closed 2 years ago

ArtanisTheOne commented 2 years ago

I've followed all instructions with a corpus size of around 300,000 (vocab 25,000) and keep on running into this issues (have tried multiple times, same problem). I've completed all pre-processing, model training etc successfully but the library just errors upon a specific entry in the source.vocab (below)

Do you have any idea how I can resolve my issue?

ymoslem commented 2 years ago

Hello!

It seems that you are using the vocab file created by SentencePiece. You should rather use the vocab file created by OpenNMT onmt_build_vocab

Alternatively, you can use the script spm_to_vocab.py to convert a SentencePiece vocab file to OpenNMT-py compilable format. If you rather use OpenNMT-tf, there is a command for this vocab conversion process.

I hope this helps.

Kind regards, Yasmin