please help try moondream2/uform multimodal model?

aoke79 commented 5 months ago

Dear, I've tried to port uform/moondream2 into IA platform by BigDL, however, they failed. might you please have a look? I've attached the source code FYI. [Uploading moondream.zip…]()

Thanks a lot,

aoke79 commented 5 months ago

there are the logs in the folder. Thanks,

Edward-Lin commented 3 months ago

Have your ever try it?

lzivan commented 2 months ago

Hi @aoke79 , is this link for moondream.zip still available? It can't be downloaded on my end.

aoke79 commented 2 months ago

seems not, let me know re-upload it. thanks

lzivan commented 2 months ago

Still not working, can you re-upload it?

aoke79 commented 2 months ago

moondream.zip

lzivan commented 2 months ago

Hi @aoke79 , what environment are you running at. I've successfully tested it on Linux CPU.

aoke79 commented 2 months ago

windows, MTL iGPU

lzivan commented 2 months ago

Hi @aoke79 , I tried on MTL iGPU, works fine on my end.

First is regular installation for IPEX-GPU

Here is the python code I used:

from transformers import AutoTokenizer
from PIL import Image
# from ipex_llm import optimize_model
from ipex_llm.transformers import AutoModelForCausalLM

model_id = "<model path or repo-id>"
revision = "2024-05-20"
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, revision=revision
)
model = model.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)

image = Image.open('./images/demo-1.jpg')
enc_image = model.encode_image(image).to('xpu')
print(model.answer_question(enc_image, "Describe this image.", tokenizer))

Output

2024-07-10 16:29:29,021 - INFO - intel_extension_for_pytorch auto imported
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
A young girl with white hair and blue eyes is seated at a table, holding a large burger in her hands. She is wearing a white and blue outfit, and her gaze is directed towards the camera. The background is dark, with a window and a door visible, suggesting an indoor setting.

intel-analytics / ipex-llm

please help try moondream2/uform multimodal model? #10442

Output