intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.7k stars 1.26k forks source link

performance problem about internvl image embedding using ggml.dll #12376

Open cjsdurj opened 2 days ago

cjsdurj commented 2 days ago

problem desc

Image embedding using ggml.dll provided by ipex will become slower and slower, while using llama.cpp a1631e5 build performance is stable.

test code

clip source code can be found in https://github.com/ggerganov/llama.cpp/pull/9403

#include "clip.h"

#include "internvl.h"
#include "iostream"

int main(int argc, char* argv[]) {
  std::string model_path;
  std::string image_path;
  std::string device;

  for (int i = 1; i < argc; i += 2) {
    std::string arg = argv[i];
    if (arg == "--model") {
      model_path = argv[i + 1];
    } else if (arg == "--image") {
      image_path = argv[i + 1];
    } else if (arg == "--device") {
      device = argv[i + 1];
    }
  }

  auto ctx_clip = clip_model_load(model_path.c_str(), 1 ,device);

  for (int i = 0; i < 20; i++) {
    auto embed = internvl_image_embed_make_with_filename(ctx_clip, 4,
                                                         image_path.c_str());
    std::cout << embed->embed[0] << "\n";
  }
  return 0;
}

env

ultra 7 155H igpu , windows11

rnwang04 commented 2 days ago

Hi @cjsdurj , thanks for pointing out this issue. I have fixed it, you could try it again with ggml.dll released in pip install ipex-llm>=2.2.0b20241111 tomorrow.