nyu-mll / jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
MIT License
20 stars 9 forks source link

Character Embedding #914

Open jeswan opened 3 years ago

jeswan commented 3 years ago

Issue by ppriyank Thursday Sep 19, 2019 at 08:23 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/914


If I want to use character embedding in addition to word embeddings,

When I add the following to conf file input_module = elmo char_embs = 1

I get the error :

allennlp.common.checks.ConfigurationError: "Mismatched token keys: dict_keys(['chars', 'elmo']) and dict_keys(['elmo'])"

The error is due to the line in __main__.py:

trainer, ... = build_trainer( ... ) trainer.train()

Should I manually change the model code or something?

jeswan commented 3 years ago

Comment by sleepinyourhat Thursday Sep 19, 2019 at 21:43 GMT


Bad news: No one has been using the dedicated character embedding code recently, AFAIK, and they were added before we added most of our tests, so it's possible that support is partially broken. If you want to use it and you see an easy change, I'd encourage you to make a PR.

FWIW, though, we definitely do have working support for ELMo's pretrained character encoder. There's an option to use the full ELMo, and another option to use only the character layer from ELMo.