Closed smSafiHaider closed 12 months ago
Sure @smSafiHaider to solve that you will need to use the following code instead 👍🏻
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("openlm-research/open_llama_7b_v2")
model = LlamaForCausalLM.from_pretrained("openlm-research/open_llama_7b_v2", torch_dtype=torch.float16, device_map="auto")
prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)
with torch.cuda.amp.autocast():
generation_output = model.generate(
input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))
This way you make sure that the device where the model is due to device_map="auto"
from :hugs:accelerate
is the one you use to move the torch.Tensors
before calling .generate
. Additionally, make sure you install the following dependencies in advance pip install transformers einops accelerate sentencepiece
It worked thankyou!!😊
I used the following code to load the model: `import torch from transformers import LlamaTokenizer, LlamaForCausalLM
device = torch.device('cuda')
model_path = 'openlm-research/open_llama_3b'
model_path = 'openlm-research/open_llama_7b'
model_path = 'openlm-research/open_llama_13b'
tokenizer = LlamaTokenizer.from_pretrained(model_path) model = LlamaForCausalLM.from_pretrained( model_path, torch_dtype=torch.float16, device_map='auto' )`
but when generating output it gives the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Any leads on how to solve it