microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
514 stars 127 forks source link

[C#] Regression on 0.5.0 with DML #1071

Open azchohfi opened 5 hours ago

azchohfi commented 5 hours ago

Describe the bug C# Version 0.5.0 broke DML models, such as microsoft--Phi-3-mini-4k-instruct-onnx directml-int4-awq-block-128. The model loads, but the Generator's constructor throws an Access violation exception.

To Reproduce Steps to reproduce the behavior:

  1. Try running Phi3 Sample with DML
  2. Exception line 98

Expected behavior Works just as 0.4.0.

Desktop (please complete the following information):

elephantpanda commented 4 hours ago

Well, DML didn't really work before in 0.40 . I mean it works up to a point then breaks. I was just about to update to 0.5 myself. Thanks for the warning. 🥲

I took a look at the closed pull requests and didn't see anything relating to any DML fixes which is dissapointing.

elephantpanda commented 3 hours ago

Just updated my code to 0.51 to try it out c# directml using same model as OP. After loading the model it crashes. (Didn't crash with 0.40)

Quadro P5000 GPU

Same line: generator = new Generator(model, generatorParams);

=================================================================
    Native Crash Reporting
=================================================================
Got a UNKNOWN while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods:OgaCreateGenerator <0x00097>
      at Microsoft.ML.OnnxRuntimeGenAI.Generator:.ctor <0x0004a>
      at Main:StartGeneration <0x00612>
      at <Start>d__10:MoveNext <0x002ca>
      at MoveNextRunner:InvokeMoveNext <0x00091>
      at System.Threading.ExecutionContext:RunInternal <0x001b5>
      at System.Threading.ExecutionContext:Run <0x0002a>
      at MoveNextRunner:Run <0x000ca>
      at <>c:<.cctor>b__7_0 <0x00039>
      at WorkRequest:Invoke <0x00023>
      at UnityEngine.UnitySynchronizationContext:Exec <0x0018a>
      at UnityEngine.UnitySynchronizationContext:ExecuteTasks <0x0007a>
      at System.Object:runtime_invoke_void <0x0007c>
=================================================================
Received signal SIGSEGV
Crash!!!
skyline75489 commented 1 hour ago

@RyanUnderhill This is the one we caught with the validation pipeline. I thought it was the same error but turns out it wasn't. This crash is reason why there's no log message printed. I can reproduce this locally.

skyline75489 commented 1 hour ago

Seems to be related to the newly added adapter API:

Image

skyline75489 commented 13 minutes ago

I'm seeing this with debugger attached:

Non-zero status code returned while running LayerNormalization node. Name:'LayerNorm_0' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2494)\onnxruntime.dll!00007FFE9D0A7425: (caller: 00007FFE9D0A6A8F) Exception(3) tid(3d68) 80070057 The parameter is incorrect.