intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.47k stars 1.24k forks source link

please help try moondream2/uform multimodal model? #10442

Open aoke79 opened 5 months ago

aoke79 commented 5 months ago

Dear, I've tried to port uform/moondream2 into IA platform by BigDL, however, they failed. might you please have a look? I've attached the source code FYI. [Uploading moondream.zip…]()

Thanks a lot,

aoke79 commented 5 months ago

there are the logs in the folder. Thanks,

Edward-Lin commented 3 months ago

Have your ever try it?

lzivan commented 2 months ago

Hi @aoke79 , is this link for moondream.zip still available? It can't be downloaded on my end.

aoke79 commented 2 months ago

seems not, let me know re-upload it. thanks

lzivan commented 2 months ago

Still not working, can you re-upload it?

aoke79 commented 2 months ago

moondream.zip

lzivan commented 2 months ago

Hi @aoke79 , what environment are you running at. I've successfully tested it on Linux CPU.

aoke79 commented 2 months ago

windows, MTL iGPU

lzivan commented 2 months ago

Hi @aoke79 , I tried on MTL iGPU, works fine on my end.

First is regular installation for IPEX-GPU

Here is the python code I used:

from transformers import AutoTokenizer
from PIL import Image
# from ipex_llm import optimize_model
from ipex_llm.transformers import AutoModelForCausalLM

model_id = "<model path or repo-id>"
revision = "2024-05-20"
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, revision=revision
)
model = model.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)

image = Image.open('./images/demo-1.jpg')
enc_image = model.encode_image(image).to('xpu')
print(model.answer_question(enc_image, "Describe this image.", tokenizer))

Output

2024-07-10 16:29:29,021 - INFO - intel_extension_for_pytorch auto imported
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
A young girl with white hair and blue eyes is seated at a table, holding a large burger in her hands. She is wearing a white and blue outfit, and her gaze is directed towards the camera. The background is dark, with a window and a door visible, suggesting an indoor setting.