microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
333 stars 73 forks source link

GPU suspended (887A0005) while running Phi2 example in DML #628

Open skyline75489 opened 3 weeks ago

skyline75489 commented 3 weeks ago

Running the default example doesn't work:

Namespace(verbose=True, batch_size_for_cuda_graph=1, chat_template='', model='.\\example-models\\phi2-int4-directml')
Loading model...
Model loaded
Tokenizer created
Prompt(s) encoded: ['I like walking my cute dog', 'What is the best restaurant in town?', 'Hello, how are you today?']
Args: Namespace(verbose=True, batch_size_for_cuda_graph=1, chat_template='', model='.\\example-models\\phi2-int4-directml')
Search options: {}
GeneratorParams created
Generating tokens ...

2024-06-21 11:01:47.9577714 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_0_0 node. Name:'DmlFusedNode_0_0' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(1066)\onnxruntime.dll!00007FFA8D7BA2E1: (caller: 00007FFA8D849109) Exception(2) tid(72f0) 887A0005 GPU ?Traceback (most recent call last):
  File "C:\Users\skyline\Projects\onnxruntime-genai\examples\python\model-generate.py", line 76, in <module>
    main(args)
  File "C:\Users\skyline\Projects\onnxruntime-genai\examples\python\model-generate.py", line 46, in main
    output_tokens = model.generate(params)
onnxruntime_genai.onnxruntime_genai.OrtException


Reducing the number of prompts encoded leads to successful run:

Namespace(verbose=True, batch_size_for_cuda_graph=1, chat_template='', model='.\\example-models\\phi2-int4-directml')
Loading model...
Model loaded
Tokenizer created
Prompt(s) encoded: ['Hello, how are you today?']
Args: Namespace(verbose=True, batch_size_for_cuda_graph=1, chat_template='', model='.\\example-models\\phi2-int4-directml')
Search options: {}
GeneratorParams created
Generating tokens ...

Prompt #0: Hello, how are you today?

Hello, how are you today?

# The output of the program is

# The output of the program is
.......
Tokens: 1375 Time: 15.27 Tokens per second: 90.07
aciddelgado commented 3 weeks ago

Thank you for the issue submission. We are currently experiencing crashes with DML EP with batch_size > 1 and are working with DML team to resolve it.