nyu-mll / jiant

jiant is an nlp toolkit
https://jiant.info
MIT License
1.65k stars 297 forks source link

Learn linear combinations of core LSTM weights #15

Closed W4ngatang closed 6 years ago

W4ngatang commented 6 years ago

insert learnable layer scaling parameters to be learned once LSTM weights are frozen (for eval tasks) when training on LM

sleepinyourhat commented 6 years ago

This should be done in ELMo style and only for ELMo. We should also add a flag-protected skip connection between the input and output of our pretrained BiLSTM. @W4ngatang ?

W4ngatang commented 6 years ago

I think the only skip-connection is between input (either just the ELMo charCNN or a mixture of all the ELMo layers) and output of the RNN/Transformer

sleepinyourhat commented 6 years ago

CharCNN (ELMo input) if we don't use ELMo, ELMo mixture if we do.

W4ngatang commented 6 years ago

implemented