openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.86k stars 2.19k forks source link

[Bug]: Couldn't run Phi3 model from IntenVL2 model with GPU #26652

Open acane77 opened 2 days ago

acane77 commented 2 days ago

OpenVINO Version

2024.5.0dev20240913

Operating System

Windows System

Device used for inference

GPU

Framework

PyTorch

Model used

Phi3

Issue description

Couldn't run Phi3 model with GPU. It can run normally on CPU.

This phi3 model is extracted and converted from IntenVL2-4B(https://huggingface.co/OpenGVLab/InternVL2-4B) model with this script: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/internvl2/internvl2.ipynb

Then report Can't choose implementation for rms:__module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16.

Step-by-step reproduction

  1. convert the OpenVL2 model to separate text embedding and Phi3 model with the conversion script
  2. because the input layer is inputs_embeds, not input_ids, we did some modifications in GenAI to make it accept input embedding as input;
  3. tokenize, run embedding model, then feed the embedding into LLMPipeline with GPU

Relevant log output

device is GPU
Exception from src/inference/src/cpp/core.cpp:124:
Exception from src/inference/src/dev/plugin.cpp:58:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:185:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/graph/include\primitive_type_base.h:59:
[GPU] Can't choose implementation for rms:__module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16
 node (type=rms)
[GPU] Original name: __module.model.layers.0.input_layernorm/aten::mul/Multiply_1_compressed_to_f16
[GPU] Original type: RMS
[GPU] Reason: invalid stoi argument

Issue submission checklist

acane77 commented 1 day ago

We found this model can be compiled in openvino 2024.3.0, but cannot be compiled in openvino 2024.5.0-20240913.

Here is a minimal sample:

#include "openvino/core/core.hpp"
#include "openvino/openvino.hpp"
#include <string>

int main() try {
  std::string model_path = "D:/Models/InternVL2-4B-int4-openvino";
  const char* openvino_model_xml = "openvino_model.xml";

  ov::Core core;
  auto model = core.read_model(model_path + "/" + openvino_model_xml);
  auto compiled_model = core.compile_model(model, "GPU");
  auto request = compiled_model.create_infer_request();
}
catch (const std::exception& error) {
  try {
    std::cerr << error.what() << '\n';
  } catch (const std::ios_base::failure&) {
  }
  return EXIT_FAILURE;
} catch (...) {
  try {
    std::cerr << "Non-exception object thrown\n";
  } catch (const std::ios_base::failure&) {
  }
  return EXIT_FAILURE;
}

model: https://huggingface.co/OpenGVLab/InternVL2-4B

geunhwan commented 1 day ago

This seems to be a known functional regression. Dev team is planning to fix this issue shortly.

dnkurek commented 11 hours ago

Hi, issue is most likely because of this commit

https://github.com/openvinotoolkit/openvino/commit/90d1219c98fb2fdcb7448f1d18b25a370efd7ccf

Solution would be to fall back to older behavior when stoi fails (items_num depends on dynamic axis)

Therefore, apply only stack size heuristic when items_num can not be known at build time.

dnkurek commented 5 hours ago

PR #26703 fixes the issue