Closed Butanium closed 6 months ago
Running the model with HF directly works:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Maykeye/TinyLLama-v0")
model = AutoModelForCausalLM.from_pretrained("Maykeye/TinyLLama-v0", device_map="cuda")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Same error with Code Llama
@arjunguha Can you try upgrading the transformers package? See the NDIF discord for more.
Yup fixed. Works with non-git transformers and nnight>0.2
Another workaround suggested by Jaden: Just dispatch the model on init so its not on the 'meta' device:
from nnsight import LanguageModel
model = LanguageModel('Maykeye/TinyLLama-v0',device_map='auto', dispatch=True)
prompt = "The french translation for 'hello' is:\n"
with model.trace(prompt) as trace:
pass
Another workaround (for those who want to run the model remotely for example) is to do with model.trace(prompt, scan=False)
Disabling scan doesn't seems like a big deal :
scan: if to execute the model using FakeTensor in order to update the potential sizes/dtypes of all modules’ Envoys’ inputs/outputs as well as validate things work correctly. Scanning is not free computation wise so you may want to turn this to false when running in a loop. When making interventions, you made get shape errors if scan is false as it validates operations based on shapes so for looped calls where shapes are consistent, you may want to have scan=True for the first loop. Defaults to True.
I can't run nnsight on llama models. I get a runtime error
RuntimeError: User specified an unsupported autocast device_type 'meta'
MWE:I tested:
Full stack trace: