Transformer model openbmb/MiniCPM-Llama3-V-2_5 not supported

AmazingTurtle commented 5 months ago

The bug Loading and prompting the transformer model openbmb/MiniCPM-Llama3-V-2_5 does not work. It tries to load the model (but according to nvtop nothing is allocated on my gpu). No error is thrown. Trying to prompt the LLM stops immediately without a response and without an error.

To Reproduce

from guidance import models
lm = models.Transformers('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)

print(lm + "Hello?")

Worth to mention, that openbmb provided a test script for transformers, that does work

# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16)
model = model.to(device='cuda')

tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

res = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True, # if sampling=False, beam_search will be used by default
    temperature=0.7,
    # system_prompt='' # pass system_prompt if needed
)
print(res)

riedgar-ms commented 5 months ago

@nking-1 , have you come across this in your forays into multimodal models?

hudson-ai commented 5 months ago

I actually do get an error on my machine during a forward pass: TypeError: MiniCPMV.forward() missing 1 required positional argument: 'data' (can include full traceback if helpful)

It seems that this model departs from the standard huggingface model-call API that we're using (likely because of multimodality).

AmazingTurtle commented 2 months ago

Checking in, what is the status on this one? MiniCPM 2.6 has been released, I will try out if that works now. Otherwise, is there anything I can assist the Gudiance Dev Team with to help resolve this issue? With the rise of "inner monologue" models like o1 it is clear that guidance will play a significant role in the near future in the LLM community and resolving this kind of issue might be a huge leap towards supporting a broader audience

guidance-ai / guidance

Transformer model openbmb/MiniCPM-Llama3-V-2_5 not supported #910