microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
497 stars 124 forks source link

[.NET] Genny chat-bot sample doesn't support DirectML and Phi-3 #569

Open asmirnov82 opened 5 months ago

asmirnov82 commented 5 months ago

I would like to update ..examples\csharp\Genny sample to support DirectML and Phi-3 model.

I managed to do it for Stateless mode (#568), however I faces with an issue for Stateful mode:

Inference fails with OnnxRuntimeGenAIException: 'Non-zero status code returned while running DmlFusedNode_0_0 node. Name:'DmlFusedNode_0_0' Status Message: invalid unordered_map<K, T> key'

This happens due to current implementation of private void AddPastTokens(Sequences sequences) method in Genny sample:

// Only keep (context_length - max_length) worth of history
while (_pastTokens.Count > ModelOptions.ContextLength - SearchOptions.MaxLength)
{
    _pastTokens.RemoveAt(0);
}

for Phi3 both ModelOptions.ContextLength and SearchOptions.MaxLength are equal to 4096, so this method removes all tokens from session history and passes empty collection of tokens to generatorParams.SetInputIDs.

Should it be just while (_pastTokens.Count > ModelOptions.ContextLength) instead so number of passed tokens doesn't exceed model ContextLength?

Documention on GenAI C# API (https://onnxruntime.ai/docs/genai/api/csharp.html) doesn't provide any description on what each SearchOptions fields mean and how SetInputIDs works, so I don't have a clear view what this check is aimed to achive.

Could you please provide more info? If my understanding is correct, fix can be applied. If you approve it, I'll finish #568 PR

asmirnov82 commented 5 months ago

@baijumeswani could you please assist?

natke commented 5 months ago

Thanks for raising this issue @asmirnov82. We will look into it

arafattehsin commented 3 months ago

Hey team! I am not able to make it run.

Below is my folder for the directml folder

image

When I debug an app, I get this:

image

asmirnov82 commented 3 months ago

Hi @arafattehsin, for using DirectML model in the Genny example, you have to build it using Debug_DirectML or Release_DirectML solution configuration. Could you please double check, that correct dependency is linked (it should be Microsoft.ML.OnnxRuntimeGenAI.DirectML nuget package)?

arafattehsin commented 3 months ago

Hey @asmirnov82. Yes, it works well but it is very very slow.. not sure why. Also, the Debug_Cuda doesn't work despite I have got Nvidia P1000 4GB.

asmirnov82 commented 3 months ago

I also noticed, that DirectML implementation works much slower than CUDA (actually overal performance is comparable to running inference on CPU). Issues with running CUDA in your case may be related on missing or incorrect installation of CUDA drivers or CUDA SDK (this is what I'll personally check first. Here is the link that may help in this case: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)

arafattehsin commented 3 months ago

Thanks @asmirnov82 I tried a few links but still no luck. I have got all the libraries installed but still investigating. Thanks for the pointers though :)

asmirnov82 commented 3 months ago

Actually I've got the same issue as you on my PC, trying to run all CSharp examples in Cuda mode. I create a new issue for that: https://github.com/microsoft/onnxruntime-genai/issues/716

@natke, @PatriceVignola, @baijumeswani could you please take a look? You created a very good and usefull lib, but currently it's very difficult to use it as CSharp examples don't work correctly and there isn't enought documentation. Could you also please answer my initial question about using Genny example in Stateful mode?

arafattehsin commented 3 months ago

Thanks @asmirnov82 - I am sure that Genny's example is going to be super useful if we can make it work. My installation is right I believe.

image

JranZu commented 2 months ago

Has anyone solved for this (the original issue: DmlFusedNode_0_0' Status Message: invalid unordered_map<K, T> key)?