huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
153 stars 200 forks source link

Error when running llama2_fine_tuning_inference & Intel_Gaudi_Fine_Tuning examples #1467

Open epage480 opened 1 week ago

epage480 commented 1 week ago

System Info

Google Colab (CPU runtime)

Information

Tasks

Reproduction

  1. Copy llama2_fine_tuning_inference.ipynb and upload/open in google colab
  2. Add an additional code cell below the "exit()" cell with the following: !git clone https://github.com/HabanaAI/Gaudi-tutorials.git
  3. Replace with a valid hugging face token
  4. Run all cells up to "python3 ../gaudi_spawn.py..."
  5. You should see an error: Traceback (most recent call last): File "/content/Gaudi-tutorials/PyTorch/llama2_fine_tuning_inference/optimum-habana/examples/language-modeling/../gaudi_spawn.py", line 34, in from optimum.habana.distributed import DistributedRunner File "/usr/local/lib/python3.10/dist-packages/optimum/habana/init.py", line 34, in check_synapse_version() File "/usr/local/lib/python3.10/dist-packages/optimum/habana/utils.py", line 207, in check_synapse_version habana_frameworks_version_number = get_habana_frameworks_version() File "/usr/local/lib/python3.10/dist-packages/optimum/habana/utils.py", line 245, in get_habana_frameworks_version return version.parse(output.stdout.split("\n")[0].split()[-1]) IndexError: list index out of range

Expected behavior

I would expect it to run and fine-tune the model with no errors.

I discovered this post which shows someone had the same problem but it does not describe or link to how it was resolved: https://discuss.huggingface.co/t/error-when-running-examples-in-optimum-habana/74944

This is probably user error, any pointers would be appreciated!

regisss commented 1 week ago

Here is the link to the former issue: https://github.com/huggingface/optimum-habana/issues/741 Not sure why it doesn't link to that in the forum discussion.

Can you provide the outputs of the commands that are suggested in this issue please? You're running your script on Colab and it is not set up with the Gaudi libraries. You probably need to run your script in a Docker container using one of the images published by Intel. For example:

docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu24.04/habanalabs/pytorch-installer-2.4.0:latest