Open bakszero opened 3 years ago
Sorry for the late response. We use the tokenizer.perl in https://github.com/OpenNMT/OpenNMT-py/tree/master/tools to pre-process the data. < SEP >
is original [SEP]
, which represents the separator between utterances or knowledge paragraphs.
But I never try interactive fashion. So I'm not sure if the code is ok for interactive fashion.
Alright, thanks for pointing it out. Ah no, that's a bummer :( Would have loved to try an interactive fashion
Hi, I'm trying to run the translate script for own specified knowledge and a user utterance after training the models. I noticed that I'm able to run the translate script with the given test data successfully.
However, when specifying my own input in the files, I get a runtime error as below. My input was of the form:
For --src src.txt:
Heyy hows it going?
For --knl knl.txt:
Today is a rainy day. There are clouds all over the sky.
And I get the following runtime error with a batch-size of 1:
On some investigation, I did find that if one includes
< SEP >
2 times in the source file sentence and atleast 3 times in the knowledge file sentence, then the scripts work successfully. I could not find in the code something where this is documented.Please could you help me to understand the correct data format required for running for my own knowledge and given utterance in an interactive fashion - it'd be super nice! Thanks!