Troubles training a simple network

Hi, thanks for the great code and paper.

I'm trying to train a simple task, as a POC, where the (using the paper annotation) X is the first 4 words of a sentences and Y is the sentence it self From my experience this should take not too long before convergence. But I'm not able to train quickly enough (After training of around an hour) it works. Any implementation hints how to make it work? Maybe it a preprocessing issue? Maybe the fact that we have some pretrained components and a few randomly initialized weights making it harder to train in a quick manner?

I'm running similarly to the CNN psa config (minus the Copy)

Thanks.

harvardnlp / encoder-agnostic-adaptation

Troubles training a simple network #6