Closed lfoppiano closed 1 year ago
From a comment in the Dockerconfig file, we should add:
RUN python3 preload_embeddings.py --embedding elmo-en --registry ./resources-registry.json
to download the elmo embeddings, right?
Hi @lfoppiano ! Thank you for the issue. For some reasons, I keep forgetting to add to DeLFT the automatic download of ELMo embeddings like the other embeddings (it should 2-3 lines to add).
So for the moment it has to be done manually. for example for the English ELMo:
cd /path/to/store/elmo
wget https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json
wget https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5
resources-registry
, under the entry embeddings-contextualized
, update the field "url_config"
and "url_weights"
to the downloaded files under /path/to/store/elmo
preload_embeddings.py
is more for preparing Docker images, it is not working with ELMo embeddings I think, just for word embeddings (I've never packaged ELMo in the docker image).
To do (very simple):
preload_embeddings.py
to cover ELMo too Then it would work like the other embeddings.
add the automatic ELMo embeddings download like for the other embeddings
see PR https://github.com/kermitt2/delft/pull/157
See also https://github.com/kermitt2/grobid/issues/946 for the same issue.
I tried both on Apple M1 and Docker on Linux, the issue is similar, but let's take the docker as reference.
Configuration:
Error log, as you can see the models without ELMo are loading without issues:
The content of grobid-home: