[Bug]: Couldn't run Phi3 model from IntenVL2 model with GPU

acane77 commented 2 days ago

OpenVINO Version

2024.5.0dev20240913

Operating System

Windows System

Device used for inference

GPU

Framework

PyTorch

Model used

Phi3

Issue description

Couldn't run Phi3 model with GPU. It can run normally on CPU.

This phi3 model is extracted and converted from IntenVL2-4B(https://huggingface.co/OpenGVLab/InternVL2-4B) model with this script: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/internvl2/internvl2.ipynb

Then report Can't choose implementation for rms:__module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16.

Step-by-step reproduction

convert the OpenVL2 model to separate text embedding and Phi3 model with the conversion script
because the input layer is inputs_embeds, not input_ids, we did some modifications in GenAI to make it accept input embedding as input;
tokenize, run embedding model, then feed the embedding into LLMPipeline with GPU

Relevant log output

device is GPU
Exception from src/inference/src/cpp/core.cpp:124:
Exception from src/inference/src/dev/plugin.cpp:58:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:185:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/graph/include\primitive_type_base.h:59:
[GPU] Can't choose implementation for rms:__module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16
 node (type=rms)
[GPU] Original name: __module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16
[GPU] Original type: RMS
[GPU] Reason: invalid stoi argument

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

acane77 commented 1 day ago

We found this model can be compiled in openvino 2024.3.0, but cannot be compiled in openvino 2024.5.0-20240913.

Here is a minimal sample:

#include "openvino/core/core.hpp"
#include "openvino/openvino.hpp"
#include <string>

int main() try {
  std::string model_path = "D:/Models/InternVL2-4B-int4-openvino";
  const char* openvino_model_xml = "openvino_model.xml";

  ov::Core core;
  auto model = core.read_model(model_path + "/" + openvino_model_xml);
  auto compiled_model = core.compile_model(model, "GPU");
  auto request = compiled_model.create_infer_request();
}
catch (const std::exception& error) {
  try {
    std::cerr << error.what() << '\n';
  } catch (const std::ios_base::failure&) {
  }
  return EXIT_FAILURE;
} catch (...) {
  try {
    std::cerr << "Non-exception object thrown\n";
  } catch (const std::ios_base::failure&) {
  }
  return EXIT_FAILURE;
}

model: https://huggingface.co/OpenGVLab/InternVL2-4B