Meta-Llama-3 model text-generation example output is unexpected on 2 nodes

aslanxie commented 1 month ago

System Info

deepspeed                 0.14.4+hpu.synapse.v1.18.0
optimum-habana            1.14.0

docker image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Setup 2 nodes for test
Run text generation example python3 ../gaudi_spawn.py --hostfile hostfile --use_deepspeed --world_size 16 --master_port 29500 \ run_generation.py \ --model_name_or_path /data1/zhixue/Llama-3.1-70B-Instruct/ \ --bf16 \ --batch_size 1 \ --use_hpu_graphs --limit_hpu_graphs \ --max_new_tokens 512
The generation output looks like: 10.233.108.205: Input/outputs: 10.233.108.205: input 1: ('DeepSpeed is a machine learning framework',) 10.233.108.205: output 1: ('DeepSpeed is a machine learning framework!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!',)

Expected behavior

If test with model Llama-2-7b-hf, the output is below. I found the issue on the latest meta-llama-3 and meta-llama-3.1 with 2 nodes inference. 10.233.108.205: input 1: ('DeepSpeed is a machine learning framework',) 10.233.108.205: output 1: ('DeepSpeed is a machine learning framework for deep learning. It is designed to be fast and efficient, while also being easy to use. DeepSpeed is based on the TensorFlow framework, and it uses the TensorFlow Lite library to run on mobile devices.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is',)

aslanxie commented 1 month ago

No problem on singe node.

aslanxie commented 1 month ago

It's no problem on single node.

regisss commented 1 month ago

So you see this issue on 2 nodes right?

aslanxie commented 1 month ago

Yes, it's only on 2 nodes with Llama-2-70b-hf or Llama-3.1-70B-Instruct.

regisss commented 2 hours ago

@aslanxie Still seeing this issue installing Optimum Habana's main branch from source?

huggingface / optimum-habana