studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 102 forks source link

LUKE Large Finetuning Duration for NER #138

Closed taghreed34 closed 2 years ago

taghreed34 commented 2 years ago

Hello there,

I'm trying to finetune LUKE large for NER task using data with conll format from multiple sources including conll2003 itself, I've combined them together. The training is being done on Google Colab GPU, but it takes so long to finish a single batch, it took about 2 hours to train on just 2 batches with a batch size of 2, is this expected? And if not, why does this happen?

Thanks in advance

ryokan0123 commented 2 years ago

Hi,

A single batch taking an hour sounds very unusual. Is GPU properly enabled? But it seems too long even with CPU... I suspect that it has something to do with the setup of Google Colab GPU but have no idea about what is going on there.

taghreed34 commented 2 years ago
taghreed34 commented 2 years ago

@Ryou0634 Yes the GPU is enabled, or the training task wouldn't have started from the beginning, it would have raise an error. But anyways I changed runtime type to GPU before running the notebook. Do I misunderstand something?

ryokan0123 commented 2 years ago

OK, could you share the notebook or the code snippet if possible?

taghreed34 commented 2 years ago

@Ryou0634 Here it is https://colab.research.google.com/drive/157UhKiSUOISGZVIhQE3f5XkTguUMlhGf?usp=sharing

ryokan0123 commented 2 years ago

I tried running the training on Google Colab myself and with my code and data, I observed no problems. I recommend installing the dependency via poetry like this.

!pip install -q --pre poetry
!poetry --version
!git clone https://github.com/studio-ousia/luke.git

cd luke

!poetry install

Just using pip can install the wrong versions of the libraries.

This is my notebook for reference (place the data as necessary). https://colab.research.google.com/drive/1r6YJaMHRgfOCVhQ8vDABvcYct1-Od7KH?usp=sharing

taghreed34 commented 2 years ago

It still says no module named (Wikipedia2vec, ujson, allennlp, allennlp_models, and google.cloud.storage.retry) after using poetry. @Ryou0634

ryokan0123 commented 2 years ago

Does poetry install seem successful? If so, how about running poetry run allennlp train ...? (add poetry run before your training command)

taghreed34 commented 2 years ago

Not for all packages, there are some of them with a "pending", "downloading" or "say 40%". @Ryou0634

taghreed34 commented 2 years ago

I tried to install unfound modules with "poetry run pip3 install ..." then I ran training command with poetry run in the beginning, now the problem in cuda out of memory, this doesn't make sense.

ryokan0123 commented 2 years ago

Yes, I am not so familiar with Google Colab, but I think they assign a different machine to you every session. That may be why you got the running-so-slow problem and now it is OOM.

As the training process is quite computationally intensive, I recommend running it in a more stable and powerful environment if possible.

taghreed34 commented 2 years ago

Okay I'll do that as soon as possible.

Thank you so much for your time and effort.