nyu-mll / jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
MIT License
21 stars 9 forks source link

[CLOSED] Learn linear combinations of core LSTM weights #15

Closed jeswan closed 4 years ago

jeswan commented 4 years ago

Issue by W4ngatang Monday Jun 25, 2018 at 05:14 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/15


insert learnable layer scaling parameters to be learned once LSTM weights are frozen (for eval tasks) when training on LM

jeswan commented 4 years ago

Comment by sleepinyourhat Tuesday Jun 26, 2018 at 18:15 GMT


This should be done in ELMo style and only for ELMo. We should also add a flag-protected skip connection between the input and output of our pretrained BiLSTM. @W4ngatang ?

jeswan commented 4 years ago

Comment by W4ngatang Tuesday Jun 26, 2018 at 18:51 GMT


I think the only skip-connection is between input (either just the ELMo charCNN or a mixture of all the ELMo layers) and output of the RNN/Transformer

jeswan commented 4 years ago

Comment by sleepinyourhat Tuesday Jun 26, 2018 at 18:54 GMT


CharCNN (ELMo input) if we don't use ELMo, ELMo mixture if we do.

jeswan commented 4 years ago

Comment by W4ngatang Wednesday Jun 27, 2018 at 01:43 GMT


implemented