Open jeswan opened 4 years ago
Comment by pep8speaks Thursday Dec 19, 2019 at 01:50 GMT
Hello @Kelina! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
jiant/models.py
:Line 1099:23: E702 multiple statements on one line (semicolon)
jiant/tasks/seq2seq.py
:Line 170:43: E231 missing whitespace after ','
jiant/tasks/tasks.py
:Line 2725:99: W291 trailing whitespace
jiant/trainer.py
:Line 854:101: E501 line too long (108 > 100 characters)
You can repair most issues by installing black and running: black -l 100 ./*
. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.
Comment by sleepinyourhat Thursday Dec 19, 2019 at 02:33 GMT
Thanks! What results/sanity checks do you have so far?
Comment by sleepinyourhat Thursday Dec 19, 2019 at 02:35 GMT
And more generally, what's the state of MT/seq2seq support right now? What's verified to work well, what should work, and are there any possible configurations to avoid?
Comment by Kelina Thursday Dec 19, 2019 at 15:36 GMT
Thanks! What results/sanity checks do you have so far?
I trained one model with similar hyperparameters to https://arxiv.org/pdf/1508.04025.pdf. The new model obtained 21.12 BLEU on the validation set, so it's roughly comparable.
Comment by Kelina Thursday Dec 19, 2019 at 15:39 GMT
And more generally, what's the state of MT/seq2seq support right now? What's verified to work well, what should work, and are there any possible configurations to avoid?
For now, morphological segmentation as in the respective config file works. MT works as in the respective config file as well; no other configurations have been tried.
Comment by sleepinyourhat Friday Dec 20, 2019 at 15:08 GMT
Could you say more about the exact comparison you're making for WMT. Is it test vs. test? What's the target number you're trying to match?
Comment by Kelina Sunday Jan 12, 2020 at 22:10 GMT
Still pending from before:
- Could you say more about the exact comparison you're making for WMT. Test set vs test set? Dev set vs dev set? What's the exact target number you're trying to match, how close are you, and how does our setup differ?
- If transformers (like RoBERTa) are supported, give an example config (or at least some instructions), if not, add an assert/error, and open an issue saying what we'd need to do to add support. We can't start our experiments without this.
I compared the development set results to reported (test set) results. We are slightly better, but our hyperparameters also differ slightly. From above: "I trained one model with similar hyperparameters to https://arxiv.org/pdf/1508.04025.pdf. The new model obtained 21.12 BLEU on the validation set, so it's roughly comparable."
The code runs with RoBERTa, but the results in my one-run experiment were worse (12.4 BLEU on dev); I am unsure if that's because of the hyperparameters. I added the config file I used for that.
Comment by Kelina Sunday Jan 12, 2020 at 22:12 GMT
Still work in progress, right?
The open remaining points are probably resolved better and faster by either @pruksmhc or @HaokunLiu. I tagged them above in the respective comments.
Comment by pruksmhc Sunday Feb 02, 2020 at 14:57 GMT
@Kelina where did you get the data? @sleepinyourhat perhaps you know as well? Seems like there's some train/test/val splits out there, but the preprocessing code in seq2seq tass here doesn't seem to handle those splits.
Comment by pruksmhc Tuesday Feb 04, 2020 at 20:21 GMT
A number we should aim to _atleast have for RoBERTa is 27.14 BLEU (from https://openreview.net/pdf?id=Hyl7ygStwB),which is the number reported for using BERT to initialize the encoder of NMT). According to that same paper though, using BERT as an encoder actually drops performance (even when compared to the original transformers model for NMT). I don't think anyone has published results for RoBERTa for NMT yet (?). @sleepinyourhat I'm running experiments on RoBERTa now but seems like it's going to take a while. EDIT: Actually, I don't think anyone's tried BERT -> LSTM decoder directly. That paper i mentioned above used a Transformer architecture for decoder.
Issue by Kelina Thursday Dec 19, 2019 at 01:50 GMT Originally opened as https://github.com/nyu-mll/jiant/pull/979
Introducing a recurrent sequence-to-sequence model for MT.
Kelina included the following code: https://github.com/nyu-mll/jiant/pull/979/commits