Closed iz2late closed 4 months ago
Hi! The checkpoint is formatted the same as in the LLaVA official repo, you should be able to train, eval and demo run the model following their instructions. Will see how to integrate into HF later.
Hi, we've converted our model into HF format, and you can access it from here: https://huggingface.co/zzxslp/som-llava-v1.5-13b-hf, and here is the example code.
from PIL import Image
import requests
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_path = "zzxslp/som-llava-v1.5-13b-hf"
model = LlavaForConditionalGeneration.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)
prompt = "USER: <image>\nWhat's the content of the image? ASSISTANT:"
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors="pt")
# Generate
generate_ids = model.generate(**inputs, max_new_tokens=20)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print (output)
I attempted to use the following code, but unfortunately, it didn't work out:
I'm wondering if it's possible to directly load the som-llava model using the Transformers library. Is this functionality currently supported, or is it not compatible with this approach?