smSafiHaider commented 12 months ago

I used the following code to load the model: `import torch from transformers import LlamaTokenizer, LlamaForCausalLM

device = torch.device('cuda')

model_path = 'openlm-research/open_llama_3b'

model_path = 'openlm-research/open_llama_7b'

model_path = 'openlm-research/open_llama_13b'

tokenizer = LlamaTokenizer.from_pretrained(model_path) model = LlamaForCausalLM.from_pretrained( model_path, torch_dtype=torch.float16, device_map='auto' )`

but when generating output it gives the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Any leads on how to solve it

alvarobartt commented 12 months ago

Sure @smSafiHaider to solve that you will need to use the following code instead 👍🏻

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("openlm-research/open_llama_7b_v2")
model = LlamaForCausalLM.from_pretrained("openlm-research/open_llama_7b_v2", torch_dtype=torch.float16, device_map="auto")

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

with torch.cuda.amp.autocast():
    generation_output = model.generate(
        input_ids=input_ids, max_new_tokens=32
    )
    print(tokenizer.decode(generation_output[0]))

This way you make sure that the device where the model is due to device_map="auto" from :hugs:accelerate is the one you use to move the torch.Tensors before calling .generate. Additionally, make sure you install the following dependencies in advance pip install transformers einops accelerate sentencepiece

smSafiHaider commented 12 months ago

It worked thankyou!!😊

openlm-research / open_llama

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #76

model_path = 'openlm-research/open_llama_7b'

model_path = 'openlm-research/open_llama_13b'