nyu-mll / jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
MIT License
21 stars 9 forks source link

[CLOSED] MT: fix target endtoken; masking encoder context; add beam search #184

Closed jeswan closed 4 years ago

jeswan commented 4 years ago

Issue by yukatherin Tuesday Jul 17, 2018 at 23:40 GMT Originally opened as https://github.com/nyu-mll/jiant/pull/184


I believe all comments on https://github.com/jsalt18-sentence-repl/jiant/pull/175 are addressed except validation/eval, which will do now. Will do bleu scoring tomorrow

Tested beamsearch by overfitting to 3 sentences and generating them back.

Without attention: image

With attention: image


yukatherin included the following code: https://github.com/nyu-mll/jiant/pull/184/commits

jeswan commented 4 years ago

Comment by iftenney Wednesday Jul 18, 2018 at 02:24 GMT


Closing for now; please re-open when ready for review.

jeswan commented 4 years ago

Comment by yukatherin Wednesday Jul 18, 2018 at 02:33 GMT


@EdouardGrave It would be really good if you could review this - whenever you have time.

jeswan commented 4 years ago

Comment by sleepinyourhat Wednesday Jul 18, 2018 at 14:56 GMT


Does this fix the no-attention bug you'd mentioned on Slack yesterday, or is that a separate issue?

Also, reviewing now.

jeswan commented 4 years ago

Comment by sleepinyourhat Wednesday Jul 18, 2018 at 15:01 GMT


Also, is BLEU ready? I see code for it, but the comments on this thread suggest it isn't...

jeswan commented 4 years ago

Comment by yukatherin Wednesday Jul 18, 2018 at 18:31 GMT


bleu scores in the python version look reasonably ok, but still need to write out predictions and match fairseq bleu scoring

image

jeswan commented 4 years ago

Comment by yukatherin Wednesday Jul 18, 2018 at 19:03 GMT


@sleepinyourhat At what point could we consider merging this? Should I get one run's learning curve with the python version of bleu scoring? I think @W4ngatang was waiting on it for the masking fix.

jeswan commented 4 years ago

Comment by sleepinyourhat Wednesday Jul 18, 2018 at 19:34 GMT


I'm waiting for confirmation of what this is supposed to do before I give it a final review. See the questions above about BLEU and the bug.

jeswan commented 4 years ago

Comment by yukatherin Wednesday Jul 18, 2018 at 19:48 GMT


Yes, the main points are:

I started a real run: python main.py --config config/defaults.conf --overrides "exp_name = test-wmt, train_tasks = wmt14_en_de, eval_tasks = wmt14_en_de, run_name = test-wmt1-noelmo, elmo = 1, elmo_chars_only = 1, dropout = 0.2, lr = 0.003, val_interval = 10000, max_vals = 50, batch_size = 24, max_word_v_size = 30000, max_seq_len=20, mt_attention=none"

and the bleu scores are a bit suspiciously high after 10k batches - have not compared the different versions carefully. Will do that and evaluate using the other version as well image

jeswan commented 4 years ago

Comment by sleepinyourhat Wednesday Jul 18, 2018 at 20:00 GMT


Cool. Either mark BLEU as experimental (with lots of warnings) or remove it. Then test on demo.conf. If everything works then, let me know, and I can merge.

jeswan commented 4 years ago

Comment by yukatherin Wednesday Jul 18, 2018 at 20:45 GMT


@sleepinyourhat Yes, I printed one warning in bleu_scoring.py and tested on demo.conf. I think it's ready.