Question about running fine-tuned model

Not sure if this is the right location to ask

I have fine-tuned the model and have the corresponding model_adapter.bin / model_config.json files

How would I go about actually testing my model locally?

I have some test code but don't think it's working properly (it just repeated half of my question and cutoff):

from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig
import torch

peft_model_id = "/mnt/d/output2"
config = PeftConfig.from_pretrained(peft_model_id)
model = LlamaForCausalLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = LlamaTokenizer.from_pretrained(config.base_model_name_or_path)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
model.eval()

inputs = tokenizer("What are the clinical differences between ciprofloxacin and levaquin?", return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=10)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

kbressem / medAlpaca

Question about running fine-tuned model #37