Closed jeswan closed 4 years ago
Comment by sleepinyourhat Tuesday Jun 26, 2018 at 18:15 GMT
This should be done in ELMo style and only for ELMo. We should also add a flag-protected skip connection between the input and output of our pretrained BiLSTM. @W4ngatang ?
Comment by W4ngatang Tuesday Jun 26, 2018 at 18:51 GMT
I think the only skip-connection is between input (either just the ELMo charCNN or a mixture of all the ELMo layers) and output of the RNN/Transformer
Comment by sleepinyourhat Tuesday Jun 26, 2018 at 18:54 GMT
CharCNN (ELMo input) if we don't use ELMo, ELMo mixture if we do.
Issue by W4ngatang Monday Jun 25, 2018 at 05:14 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/15
insert learnable layer scaling parameters to be learned once LSTM weights are frozen (for eval tasks) when training on LM