huggingface / optimum-amd

AMD related optimizations for transformer models
https://huggingface.co/docs/optimum/amd/index
MIT License
57 stars 17 forks source link

Add decoder modeling #108

Open mht-sharma opened 7 months ago

mht-sharma commented 7 months ago

As per title!

Example Usage:

from optimum.amd.ryzenai import RyzenAIModelForCausalLM
from transformers import AutoTokenizer
from tests.ryzenai.testing_utils import DEFAULT_VAIP_CONFIG_TRANSFORMERS

model_path = # OPT/LLama model quantized using Brevitas
vaip_config = DEFAULT_VAIP_CONFIG_TRANSFORMERS
model = RyzenAIModelForCausalLM.from_pretrained(model_path, vaip_config=vaip_config)
tokenizer= AutoTokenizer.from_pretrained(model_path)

prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")

generated_text = model.generate(**inputs, max_new_tokens=30, do_sample=False)
print(generated_text)