Closed jeswan closed 4 years ago
Comment by iftenney Wednesday Jul 18, 2018 at 02:24 GMT
Closing for now; please re-open when ready for review.
Comment by yukatherin Wednesday Jul 18, 2018 at 02:33 GMT
@EdouardGrave It would be really good if you could review this - whenever you have time.
Comment by sleepinyourhat Wednesday Jul 18, 2018 at 14:56 GMT
Does this fix the no-attention bug you'd mentioned on Slack yesterday, or is that a separate issue?
Also, reviewing now.
Comment by sleepinyourhat Wednesday Jul 18, 2018 at 15:01 GMT
Also, is BLEU ready? I see code for it, but the comments on this thread suggest it isn't...
Comment by yukatherin Wednesday Jul 18, 2018 at 18:31 GMT
bleu scores in the python version look reasonably ok, but still need to write out predictions and match fairseq bleu scoring
Comment by yukatherin Wednesday Jul 18, 2018 at 19:03 GMT
@sleepinyourhat At what point could we consider merging this? Should I get one run's learning curve with the python version of bleu scoring? I think @W4ngatang was waiting on it for the masking fix.
Comment by sleepinyourhat Wednesday Jul 18, 2018 at 19:34 GMT
I'm waiting for confirmation of what this is supposed to do before I give it a final review. See the questions above about BLEU and the bug.
Comment by yukatherin Wednesday Jul 18, 2018 at 19:48 GMT
Yes, the main points are:
I started a real run: python main.py --config config/defaults.conf --overrides "exp_name = test-wmt, train_tasks = wmt14_en_de, eval_tasks = wmt14_en_de, run_name = test-wmt1-noelmo, elmo = 1, elmo_chars_only = 1, dropout = 0.2, lr = 0.003, val_interval = 10000, max_vals = 50, batch_size = 24, max_word_v_size = 30000, max_seq_len=20, mt_attention=none"
and the bleu scores are a bit suspiciously high after 10k batches - have not compared the different versions carefully. Will do that and evaluate using the other version as well
Comment by sleepinyourhat Wednesday Jul 18, 2018 at 20:00 GMT
Cool. Either mark BLEU as experimental (with lots of warnings) or remove it. Then test on demo.conf. If everything works then, let me know, and I can merge.
Comment by yukatherin Wednesday Jul 18, 2018 at 20:45 GMT
@sleepinyourhat Yes, I printed one warning in bleu_scoring.py and tested on demo.conf. I think it's ready.
Issue by yukatherin Tuesday Jul 17, 2018 at 23:40 GMT Originally opened as https://github.com/nyu-mll/jiant/pull/184
I believe all comments on https://github.com/jsalt18-sentence-repl/jiant/pull/175 are addressed except validation/eval, which will do now. Will do bleu scoring tomorrow
Tested beamsearch by overfitting to 3 sentences and generating them back.
Without attention:
With attention:
yukatherin included the following code: https://github.com/nyu-mll/jiant/pull/184/commits