Training Too Slow for Less Training Data

Hi,

I am working on a problem that requires acronyms to be converted to expansions. When I use these pairs in reasonable volume for training, say above 1000 data, an NMT model generation takes 1 or 2 hours on a Titan X machine. The validation data size is 600 here. But when I give only 20 or lesser than 300 data for training and a validation data of 600, it consumes more time, say more than a day. These kind of use of training data are necessary for my application to reduce human labeling to generate ground truth for training. Can anyone suggest a way to improve the performance or reason out why this behavior is observed?. Increasing the batch size as mentioned in https://github.com/tensorflow/nmt/issues/183 doesn't help here as the number of training data is very less.

tensorflow / nmt

Training Too Slow for Less Training Data #432