Can´t use ELMO in DeepLearning Train

rodyoukai commented 2 years ago

What is your OS and architecture? Ubuntu 22.04
What is your Java version (java --version)? openjdk version "1.8.0_342"

Hi @kermitt2, I have success to Train a citation model using DELFT but when I try to use ELMO y get this error message...

Loading data... Error: either provide a path to a directory with the ELMo model individual options and weight file or to the model in a ZIP archive. 10000 total sequences 9000 train sequences 1000 validation sequences ELMo weights used: /media/lopez/T51/embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5

Where can I found this file and how can I change the path ?

lfoppiano commented 2 years ago

Dear @rodyoukai, You have to configure the ELMO embeddings under delft/delft/resource-registry.json, See the file: https://github.com/kermitt2/delft/blob/master/delft/resources-registry.json

change the path near this section:

"embeddings-contextualized": [
        {
            "name": "elmo-en",
            "path-config": "/media/lopez/T51/embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json",
            "path_weights": "/media/lopez/T51/embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5",
            "path-vocab": "data/models/ELMo/en/vocab.txt",
            "path-cache": "data/models/ELMo/en/",
            "cache-training": true,
            "lang": "en",
            "url_config": "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json",
            "url_weights": "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5"
        },

Are you using docker?

rodyoukai commented 2 years ago

Hi @lfoppiano, thanks for your answer, I change the path and now i have this other issue:

embeddings loaded for 2196007 words and 300 dimensions ELMo weights used: /home/rcuellar/T51/embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5 Error: either provide a path to a directory with the ELMo model individual options and weight file or to the model in a ZIP archive. Model for citation created in 1112798 ms

Maybe y nees download manually the hdf5 and json from aws?

kermitt2 commented 2 years ago

Hi @rodyoukai

Yes you need to download manually the config and weights file for ELMo. Contrary to transformers and static embeddings, the automatic download is not written yet ! (my fault, I forgot to add it :)

The two url are still working, so you just need to download and move these files in the path you indicated.

ELMo is working very well for sequence labeling ! (and almost as fast as transformers in the last version of DeLFT)

rodyoukai commented 2 years ago

Thanks @kermitt2 I will follow your instructions.

rodyoukai commented 2 years ago

Hi again @kermitt2 and @lfoppiano, I got the weights and config from Amazon S3 and set all config correctly, I guess, but now I got this error when I try to train the citation model:

`Loading data... 17117 total sequences 15405 train sequences 1712 validation sequences ELMo weights used: /home/rcuellar/embeddings/T51/elmo_2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5 Traceback (most recent call last): File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1380, in _do_call return fn(*args) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1363, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1456, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [2048] and type float [[{{node bilm/CNN_high_1/b_carry/Initializer/initial_value}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "delft/applications/grobidTagger.py", line 351, in train(model, File "delft/applications/grobidTagger.py", line 133, in train model = Sequence(model_name, File "/home/rcuellar/delft/delft/sequenceLabelling/wrapper.py", line 112, in init self.embeddings = Embeddings(self.embeddings_name, resource_registry=self.registry, use_ELMo=use_ELMo) File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 82, in init self.make_ELMo() File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 332, in make_ELMo sess.run(tf.compat.v1.global_variables_initializer()) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 970, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1193, in _run results = self._do_run(handle, final_targets, final_fetches, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1373, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1399, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [2048] and type float [[node bilm/CNN_high_1/b_carry/Initializer/initial_value (defined at /home/rcuellar/delft/delft/utilities/simple_elmo/model.py:273) ]]

Errors may have originated from an input operation.

Operation defined at: (most recent call last)

File "delft/applications/grobidTagger.py", line 351, in train(model,

File "delft/applications/grobidTagger.py", line 133, in train model = Sequence(model_name,

File "/home/rcuellar/delft/delft/sequenceLabelling/wrapper.py", line 112, in init self.embeddings = Embeddings(self.embeddings_name, resource_registry=self.registry, use_ELMo=use_ELMo)

File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 82, in init self.make_ELMo()

File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 320, in make_ELMo self.elmo_model.load(vocab_file=vocab_file,

File "/home/rcuellar/delft/delft/utilities/simple_elmo/elmo_helpers.py", line 203, in load self.sentence_embeddings_op = bilm(self.sentence_character_ids)

File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 96, in call lm_graph = BidirectionalLanguageModelGraph(

File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 284, in init self._build()

File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 288, in _build self._build_word_char_embeddings()

File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 442, in _build_word_char_embeddings b_carry = tf.compat.v1.get_variable(

File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 273, in custom_getter return getter(name, *args, **kwargs)

Original stack trace for 'bilm/CNN_high_1/b_carry/Initializer/initial_value': File "delft/applications/grobidTagger.py", line 351, in train(model, File "delft/applications/grobidTagger.py", line 133, in train model = Sequence(model_name, File "/home/rcuellar/delft/delft/sequenceLabelling/wrapper.py", line 112, in init self.embeddings = Embeddings(self.embeddings_name, resource_registry=self.registry, use_ELMo=use_ELMo) File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 82, in init self.make_ELMo() File "/home/rcuellar/delft/delft/utilities/Embeddings.py", line 320, in make_ELMo self.elmo_model.load(vocab_file=vocab_file, File "/home/rcuellar/delft/delft/utilities/simple_elmo/elmo_helpers.py", line 203, in load self.sentence_embeddings_op = bilm(self.sentence_character_ids) File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 96, in call lm_graph = BidirectionalLanguageModelGraph( File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 284, in init self._build() File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 288, in _build self._build_word_char_embeddings() File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 442, in _build_word_char_embeddings b_carry = tf.compat.v1.get_variable( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1579, in get_variable return get_variable_scope().get_variable( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1322, in get_variable return var_store.get_variable( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 576, in get_variable return custom_getter(custom_getter_kwargs) File "/home/rcuellar/delft/delft/utilities/simple_elmo/model.py", line 273, in custom_getter return getter(name, *args, *kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 531, in _true_getter return self._get_single_variable( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 952, in _get_single_variable v = variables.VariableV1( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler return fn(args, kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 268, in call return cls._variable_v1_call(*args, kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 213, in _variable_v1_call return previous_getter( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 206, in previous_getter = lambda kwargs: default_variable_creator(None, kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 2612, in default_variable_creator return resource_variable_ops.ResourceVariable( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler return fn(*args, *kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 272, in call return super(VariableMetaclass, cls).call(args, kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1630, in init self._init_from_args( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1781, in _init_from_args initial_value = ops.convert_to_tensor(initial_value, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped return func(*args, **kwargs) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1621, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function return constant_op.constant(value, dtype, name=name) File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 271, in constant return _constant_impl(value, dtype, shape, name, verify_shape=False, File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 293, in _constant_impl const_tensor = g._create_op_internal( # pylint: disable=protected-access File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3697, in _create_op_internal ret = Operation( File "/home/rcuellar/delft/env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2101, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Model for citation created in 50914 ms `

Can you help me or give me some advice?

kermitt2 commented 2 years ago

Hi @rodyoukai !

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [2048] and type float

It seems that you're out of memory with ELMo, it's a good sign that it was loaded :)

What's your GPU memory capacity? Normally I think I got ELMo running well with 4GB GPU memory, but for training it's rather something like 8GB.

rodyoukai commented 2 years ago

Hi @kermitt2, Thanks for the quick answer, I have 3GB for GPU, is there a kind of workaround for this?

kermitt2 commented 2 years ago

Unfortunately training with ELMo embeddings is not manageable with this amount of GPU memory (I don't think you can run it neither below 4GB, but not sure).

rodyoukai commented 2 years ago

Thanks for the answer @kermitt2, I have another question, do you know if exist ELMo embeddings in Spanish and where can I download?

rodyoukai commented 1 year ago

Hi again @lfoppiano or @kermitt2 I need some advice, how can I create a ELMo embeddigns model in spanish? I supouse you create the english and french hdf5 files, can you share the process maybe I can generate my own embeddings in spanish....

lfoppiano commented 1 year ago

Hi again @lfoppiano or @kermitt2 I need some advice, how can I create a ELMo embeddigns model in spanish? I supouse you create the english and french hdf5 files, can you share the process maybe I can generate my own embeddings in spanish....

@rodyoukai we used the already available ELMo embeddings provided by the authors of ELMo. I personally don't know exactly how to create a new ELMo embedding from scratch. You might check this issue https://github.com/kermitt2/delft/issues/155 but tests from Patrice did not show improvement so this implementation was not kept.

kermitt2 commented 1 year ago

Hi @rodyoukai

I give you more details:

English ELMo is from AI2, the authors of ELMo, at the time training a model (1B tokens) was done with 3 GPU during 2 weeks if I remember well
Fench ELMo has been created by @pjox (who build OSCAR, which includes training resources for Spanish), he's likely the right person to contact if you want to push this forward

About elmoformanylangs, it includes an ELMo pretrained model for Spanish. However after integrating and benchmarking these models into DeLFT (and I think I did it correctly), you can see that they were marginally better than static embeddings, and not at all at the level of the "standard" ELMo models:

https://github.com/kermitt2/delft/issues/155#issuecomment-1400735270

rodyoukai commented 1 year ago

Thanks for the answers @kermitt2 & @lfoppiano

kermitt2 / grobid

Can´t use ELMO in DeepLearning Train #946