microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
481 stars 120 forks source link

OnnxRuntimeGenAIException - OrtValue shape verification failed, when running Phi-3 model with DML #424

Closed AshD closed 5 months ago

AshD commented 5 months ago

CPU version works fine with the corresponding model and nuget package. DirectML version throws the exception below.

Model: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx/tree/main/directml/directml-int4-awq-block-128 Using Microsoft.ML.OnnxRuntimeGenAI.DirectML 0.2.0-rc6 nuget

I have called generatorParams.TryGraphCaptureWithMaxBatchSize(1); Max tokens is set to 4000

Exception thrown when calling generator.ComputeLogits();

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Non-zero status code returned while running DmlFusedNode_0_0 node. Name:'DmlFusedNode_0_0' Status Message: D:\a_work\1\s\onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,32,121,96} Requested shape:{1,32,4001,96}'

PC: Windows 11 Version 10.0.22631 Build 22631 Core i9 13th Gen, 128GB with RTX4090 with the latest Nvidia drivers.

Thanks, Ash

natke commented 5 months ago

Hi @AshD, is this running the HelloPhi2 sample? (To help us with the repro)

AshD commented 5 months ago

Mostly the same code.

The exact code is here https://github.com/feiyun0112/SemanticKernel.Connectors.OnnxRuntimeGenAI

With the addition of generatorParams.TryGraphCaptureWithMaxBatchSize(1);

PatriceVignola commented 5 months ago

Hi @AshD,

The problem is on this line: generatorParams.SetSearchOption("past_present_share_buffer", onnxRuntimeGenAIPromptExecutionSettings.PastPresentShareBuffer);

When using graph capture (cuda_graph/dml_graph), PastPresentShareBuffer should always be set to true. This is something we'll make clearer or maybe even force for DML in future versions, but for now you should set it to true when using DML.

AshD commented 5 months ago

Thanks @PatriceVignola That fixes the issue :-)

Would appreciate some guidance on the discussion I opened https://github.com/microsoft/onnxruntime-genai/discussions/425