microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
481 stars 120 forks source link

BSOD when running the sample with DML runtime #1007

Open sandrohanea opened 2 days ago

sandrohanea commented 2 days ago

Describe the bug When running one of the examples for directml OnmxRuntimeGenAI, I get the blue screen of death on my Surface Lpatop Studio 2

To Reproduce Steps to reproduce the behavior:

  1. Use this sample (any would do):
    
    using Microsoft.ML.OnnxRuntimeGenAI;
    using System.Reflection.Emit;
    using System.Reflection;

var modelPath = @"C:\Models\microsoft\Phi-3-medium-4k-instruct-onnx-directml\directml-int4-awq-block-128"; var model = new Model(modelPath); var tokenizer = new Tokenizer(model);

var systemPrompt = "You are an AI assistant that helps people find information. Answer questions using a direct style. Do not share more information that the requested by the users.";

// chat start Console.WriteLine(@"Ask your question. Type an empty string to Exit.");

// chat loop while (true) { // Get user question Console.WriteLine(); Console.Write(@"Q: "); var userQ = Console.ReadLine(); if (string.IsNullOrEmpty(userQ)) { break; }

// show phi3 response
Console.Write("Phi3: ");
var fullPrompt = $"<|system|>{systemPrompt}<|end|><|user|>{userQ}<|end|><|assistant|>";
var tokens = tokenizer.Encode(fullPrompt);

var generatorParams = new GeneratorParams(model);
generatorParams.SetSearchOption("max_length", 2048);
generatorParams.SetSearchOption("past_present_share_buffer", false);
generatorParams.SetInputSequences(tokens);

var generator = new Generator(model, generatorParams);
while (!generator.IsDone())
{
    generator.ComputeLogits();
    generator.GenerateNextToken();
    var outputTokens = generator.GetSequence(0);
    var newToken = outputTokens.Slice(outputTokens.Length - 1, 1);
    var output = tokenizer.Decode(newToken);
    Console.Write(output);
}
Console.WriteLine();

}


2. Have this Csproj:
Exe net8.0 enable enable x64 win-x64

3. Run the sample with the model from here: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml/tree/main
4. Say hi

**Expected behavior**
Getting a greeting or response from the model (preferably using the NPU or at least the GPU)

**Screenshots**
Can share a video recorded using my phone with the BSOD. (contact me on Teams if needed).

**Desktop (please complete the following information):**
 - Device: Surface Laptop Studio 2 
 - OS: Windows 11 Enterprise 24H2
 - Browser: n/a
 - Version: 0.4.0 (C# `Microsoft.ML.OnnxRuntimeGenAI.DirectML` and `Microsoft.ML.OnnxRuntimeGenAI`)
 - Intel NPU Driver: 31.0.100.2016 (latest available for the NPU)
 - Nvidia GeForce RTX 4060 Driver: 31.0.15.3878

**Additional context**
- CPU Runtime works as expected
- Will test CUDA Runtime as well and report the result in the comment.
sandrohanea commented 2 days ago

Image Error is BSOD screen is VIDEO_SCHEDULER_INTERNAL_ERROR

Also, the issue is reproduced consistently on my device.

sandrohanea commented 2 days ago

I confirm that CUDA also works as expected on my system. only DML runtime is causing the BSOD.

sandrohanea commented 2 days ago

Another strange thing is that even after the model should be "loaded" the RAM usage is super low: Image