Closed Lotemp closed 8 years ago
@Lotemp can you please give more information about your setup. Like how large is your training set (number of sentences), what is your model size, how long did you train your model etc. Thanks.
Hi, my training set contains 2000 parallel sentences, development and test sets contain 500 parallel sentences each. I used a dimension of ~150 , different patience values of 10 - 20, and learning rate of 0.1 (due to the small data set). As for time / number of epochs - I used the default settings, but in all my experiments the training stopped early... Any help will be appreciated, thank you!
Your dataset seems to be quite small, have you tried fine-tuning a pre-trained model?
@orhanf , thanks for your quick response. I'm afraid that I haven't found a pre-trained model that is relevant to my task, because of the task's specificity (sarcastic text). How,in your opinion, can I maximize the performance of this package given my small data set?
@Lotemp, this definitely is a research question and probably reddit would be more fruitful and diverse in terms of plausible approaches.
But, my on the fly opinion might be, training an enc-dec model without attention on English to English monolingual data, and then initializing the parameters of an attention based enc-dec model and fine-tuning it for your task. Beware that I have no experience on your task 😃
Hi, I've managed to run train_nmt.py on my parallel (monolingual - sarcastic english to non sarcastic english) data set, but the samples generated during training have nothing to do with the source sentences.
When the training is finished, the translations created are also meaningless - "I I I " and "UNK" and such.
What am I doing wrong? thanks !
Lotem