Closed noumanqaiser closed 1 year ago
@noumanqaiser: Thanks for sharing code. It's the .onnx model file and GPU which have the biggest impact on perf, much more so than the API (e.g. .NET). What opset does the model use? DML supports up to opset 12 currently (https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html), but we have work to update that. If you don't already know, Netron can help https://netron.app/.
Hi @fdwr I checked the onnx model properties and it shows the following:
I am not sure if the format is the opset you are referring to. This is the default format that Microsoft Custom Vision produces. If the performance can be improved by using a specific format, Is there a possibility to convert this model to a specific opset to improve inference performance?
@noumanqaiser:
if the format is the opset you are referring to.
Yep, that's it, and this model uses ONNX operator set 10 (<= 12), ruling out one common problem we've seen recently where models are exported from various frameworks using opset 13, which causes fallback to the CPU. If this was WinML or ONNX Runtime, I'd recommend setting the input tensor size in the SessionOptions (WinML LearningModelSessionOptions.OverrideNamedDimension or ORT AddFreeDimensionOverride), but I'm not seeing any familiar API's above, and so this must all be going through "Microsoft CustomVision" which I'm not familiar with and don't know what it's calling under the hood. 🤔 This will take some research and asking around...
@fdwr Just wanted to check if there is an update regarding this, is there something I could do to get the CustomVision exported ONNX model to utilize GPU via DirectML.
If it would help, I would be happy to share the project/trained model file/sample images seperately.
Looking forward to hearing from you.
@noumanqaiser - that would be most useful, having the resulting .onnx model file and the inputs that could be fed directly into ONNX Runtime. I'm not familiar with it, but it sounds like there is an "Export" button in the performance, glancing here: https://docs.microsoft.com/en-us/samples/azure-samples/cognitive-services-onnx-customvision-sample/cognitive-services-onnx-customvision-sample/.
I have the onnx file, training set images, and sample c# project to run inferencing with me, what would be the best way to share these with you(if possible privately).
@noumanqaiser - I can either send a link of a OneDrive business folder to your email, or you could send a link to mine (dwayner at ms)
@fdwr I have shared with you a .Net project with the actual ONNX model and sample Images. The project runs mass inferencing and measures the average time for each inference.
https://drive.google.com/drive/folders/1DqnUvTaU9xp2QLuV_X9jFCjkratckMYL?usp=sharing
Looking forward to hear from you.
[just update] Hi Nouman, I'm back from vacation and hopefully can look hopefully this week. TY.
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Closing as stale.
Describe the bug I trained an image classification model(single tag per image) using Microsoft CustomVision and exported the model in Onnx format. I then created a .Net 5 Console App written in C#, to use the model for inferencing a large number of image samples and measure performance, my key performance metric is inferencing time(ms) per image.
I have the following packages installed:
I have tried running inferencing with both Onnxruntime and OnnexRuntime.DirectML package and in both cases get very similar performance with an average inferencing time of around 40ms. This makes me feel that for some reason the DirectML isnt really able to exploit the Nvidia MX330 GPU for any performance gains.
Urgency As a part of evaluation for Onnxruntime, I wanted to conclude/quantity performance benefits form underlying Nvidia/AMD GPUs from .Net apps. This is key for our project and any support would be appreciated.
System information
To Reproduce The following class is used to initialize the model and use it for inferencing:
To execute mass execution, I use the following code:
Expected behavior When utilizing the Onnxruntime package, the average inferencing time is ~40ms, with Onnxruntime.DirectML I expected it to be less than 10ms
Screenshots NA
Additional context This is a performance oriented question, on how well Onnxruntime.DirectML allows .NET developers to exploit benefits of faster inferencing using GPU.