AMD GPU stuck at model.eval( ) when following the Quick Start Jupyter Notebook

System Info

Pytorch: 2.1.0+rocm5.6 ROCm: 5.7.1 GPU: Sapphire RX 7900 XTX Python: 3.10.12 OS: Ubuntu 22.04.03

Information

[ ] The official example scripts
[X] My own modified scripts

🐛 Describe the bug

CODE import torch from transformers import LlamaForCausalLM, LlamaTokenizer

model_id = "meta-llama/Llama-2-7b-hf"

tokenizer = LlamaTokenizer.from_pretrained(model_id)

model = LlamaForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=)

print("Loading dataset.")

from llama_recipes.utils.dataset_utils import get_preprocessed_dataset from llama_recipes.configs.datasets import samsum_dataset

train_dataset = get_preprocessed_dataset(tokenizer, samsum_dataset, 'train')

eval_prompt = """ Summarize this dialog: A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))

Summary: """

print("loading tokenizer") model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

print("evaluating...") model.eval() with torch.no_grad(): print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)) print("evaluation complete.")

CODE END

For some reason, the evaluation never finishes whatsoever. Note that the only difference my script has with the official Llama 2 Jupyter Notebook quick start guide is that I did not use load_in_8bit=True as an option as AMD GPUs are not supported in bitsandbytes.

As seen in the image, the model is loaded into the GPU but it seems like something is preventing it from being ran properly. GPU utilization is at 100% but the memory clock is seemingly too low for it to be right.

Error logs

No error logs as the program is stuck at model.eval( ), but if attempting to end the program with ctr+C it fails. Closing the terminal running it crashes the whole computer, needing a reboot.

Expected behavior

I expect the model to give me an evaluation of the eval_prompt similar to that found in the quick start Jupyter notebook.

meta-llama / llama-recipes