studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 101 forks source link

Unable to run NER example on Colabs. #50

Closed SnoozingSimian closed 3 years ago

SnoozingSimian commented 3 years ago

In trying to run the CoNLL-2003 example on Google Colabs and faced the following error:

ModuleNotFoundError: No module named 'apex'

While trying to run this command:

python -m examples.cli --model-file=luke_large_500k.tar.gz \ --output-dir=/content/output ner run \ --data-dir=/content/upload --fp16 \ --train-batch-size=2 \ --gradient-accumulation-steps=2 --learning-rate=1e-5 \ --num-train-epochs=5

I used poetry to set up luke but it seems that it does not install apex. Is there any way to get around this other than fiddling with the poetry lock? Is this the intended behaviour? If so then why?

The code to reproduce this easily can be found here.

I also have a copy of a portion of the CoNLL-2003 dataset in the same repo which should be uploaded to the colabs before the notebook can function properly.

P.S. There was another issue about the following line (line 70) in luke/examples/ner/utils.py: assert sentence_boundaries[0] == 0 which failed because the sentence_boundaries variable was initialized without 0 (which marks the first character location), I worked around it by initializing it as sentence_boundaries = [0].

ikuyamada commented 3 years ago

Hi, Thank you for using LUKE! As mentioned here, please install the APEX library directly from its GitHub repository.

There was another issue about the following line (line 70) in luke/examples/ner/utils.py

If you use CoNLL-2003 dataset, the code should run without issues. Did you face some errors based on our NER code?

SnoozingSimian commented 3 years ago

Thank you for taking the time to reply.

  1. Thank you for using LUKE! As mentioned here, please install the APEX library directly from its GitHub repository.

I did install the apex library as mentioned, then I installed poetry. After that I ran poetry install to install luke. Then, while in the poetry shell, I tried running the command that I have mentioned which gave me the error I mentioned.

Note: I was using a colab instance to do all this.

  1. If you use CoNLL-2003 dataset, the code should run without issues. Did you face some errors based on our NER code?

I have a copy of the CoNLL-2003 data which contains only the English version of it. While running the above command without the --fp16 tag, I got an assertion error saying that the first element of the list sentence_boundaries was not 0. I tried printing the list and indeed this was the case. So I changed the code so that whenever the list is emptied, it is initialized with 0. That seemed to work.

ikuyamada commented 3 years ago

I did install the apex library as mentioned, then I installed poetry. After that I ran poetry install to install luke. Then, while in the poetry shell, I tried running the command that I have mentioned which gave me the error I mentioned. Note: I was using a colab instance to do all this.

Poetry creates a new virtual environment, so you need to install APEX after activating the environment using poetry shell. Alternatively, you can export conventional requirements.txt using poetry export as explained here.

I have a copy of the CoNLL-2003 data which contains only the English version of it. While running the above command without the --fp16 tag, I got an assertion error saying that the first element of the list sentence_boundaries was not 0. I tried printing the list and indeed this was the case. So I changed the code so that whenever the list is emptied, it is initialized with 0. That seemed to work.

I think the current code can process the CoNLL-2003 dataset without errors. Will you check if the dataset is equivalent to that available here.

SnoozingSimian commented 3 years ago

Thanks, I will check if my dataset is formatted correctly and also use the poetry export to install dependencies externally.

ikuyamada commented 3 years ago

I close this issue since there is no activity.