No performance gain from DirectML in C#

noumanqaiser commented 2 years ago

Describe the bug I trained an image classification model(single tag per image) using Microsoft CustomVision and exported the model in Onnx format. I then created a .Net 5 Console App written in C#, to use the model for inferencing a large number of image samples and measure performance, my key performance metric is inferencing time(ms) per image.

I have the following packages installed:

<PackageReference Include="Emgu.CV" Version="4.5.3.4721" />
<PackageReference Include="Emgu.CV.Bitmap" Version="4.5.3.4721" />
<PackageReference Include="Emgu.CV.runtime.windows" Version="4.5.3.4721" />
<PackageReference Include="Emgu.CV.UI" Version="4.5.3.4721" />
<PackageReference Include="Microsoft.ML" Version="1.6.0" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.6.0" />
<PackageReference Include="Microsoft.ML.OnnxRuntime" Version="1.9.0" />
<PackageReference Include="Microsoft.ML.OnnxRuntime.DirectML" Version="1.9.0" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.6.0" />
<PackageReference Include="Microsoft.ML.Vision" Version="1.6.0" />
<PackageReference Include="System.Drawing.Common" Version="5.0.2" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="5.0.0" />

I have tried running inferencing with both Onnxruntime and OnnexRuntime.DirectML package and in both cases get very similar performance with an average inferencing time of around 40ms. This makes me feel that for some reason the DirectML isnt really able to exploit the Nvidia MX330 GPU for any performance gains.

Urgency As a part of evaluation for Onnxruntime, I wanted to conclude/quantity performance benefits form underlying Nvidia/AMD GPUs from .Net apps. This is key for our project and any support would be appreciated.

System information

Windows 10 Home 21H1, Dell Inspiron 5406, Core i7 1165G7, 16GB RAM with Nvidia MX330 2GB GPU
ONNX Runtime installed from Nuget
ONNX Runtime version: 1.9
Program is written in C#, .NET 5, Console APp
Visual Studio 2019 v16.10.3
CUDA/cuDNN version: CUDA 11.4.56,
GPU model and memory: MX330 with 2GB Memory

To Reproduce The following class is used to initialize the model and use it for inferencing:

public class OnnxModelScorer
{

    public class ImageInputData
    {
        [ImageType(300, 300)]
        public Bitmap Image { get; set; }
    }

    public class ImagePrediction
    {

        [ColumnName("model_output")]
        public float[] PredictedLabels;
    }

    PredictionEngine<ImageInputData, ImagePrediction> predictionEngine;
    ModelMetadataPropertiesClass modelprops;
    Dictionary<int, string> ModelLabels = new Dictionary<int, string>();

    public void SetupPredictionEngine(string modelFolderPath, out string errors)
    {
        errors = "";
        predictionEngine = null;
        try
        {
            var mlContext = new MLContext();

            modelprops = LoadProperties(modelFolderPath + "metadata_properties.json", out string error);

            var pipeline = mlContext.Transforms
                            .ResizeImages("image", modelprops.CustomVisionPreprocessTargetWidth, modelprops.CustomVisionPreprocessTargetHeight, nameof(ImageInputData.Image), ImageResizingEstimator.ResizingKind.Fill)
                            .Append(mlContext.Transforms.ExtractPixels("data", "image"))
                            .Append(mlContext.Transforms.ApplyOnnxModel("model_output", "data", modelFolderPath + @"model.onnx"));

            var data = mlContext.Data.LoadFromEnumerable(new List<ImageInputData>());
            var model = pipeline.Fit(data);

            predictionEngine = mlContext.Model.CreatePredictionEngine<ImageInputData, ImagePrediction>(model);

            string[] labels = File.ReadAllText(modelFolderPath + @"labels.txt").Split('\n');

            int i = 0;
            foreach (var label in labels)
            {
                ModelLabels.Add(i, label);
                i++;
            }
        }
        catch (Exception ex)
        {
            errors = "Model Loading Failed: " + ex.ToString();
        }

    }

    public PredictionResultClass GetModelPrediction(Bitmap sample, out string error)
    {
        PredictionResultClass pr = new PredictionResultClass();
        error = "";
        if (predictionEngine != null)
        {
            var input = new ImageInputData { Image = sample };

            var prediction = predictionEngine.Predict(input);
            Dictionary<int, PredictionResultClass> predictionResults = new Dictionary<int, PredictionResultClass>();
            int indexofMaxProb = -1;
            float maxProbability = 0;
            for (int i = 0; i < prediction.PredictedLabels.Count(); i++)
            {
                predictionResults.Add(i,new PredictionResultClass() { Label = ModelLabels[i], probability = prediction.PredictedLabels[i] });

                if(prediction.PredictedLabels[i]>maxProbability)
                {
                    maxProbability = prediction.PredictedLabels[i];
                    indexofMaxProb = i;
                }
            }

            pr = predictionResults[indexofMaxProb];

        }
        else error = "Prediction Engine Not initialized";

        return pr;
    }
    public class PredictionResultClass
    {
        public string Label = "";
        public float probability = 0;
    }

    public void ModelMassTest(string samplesfolder)
    {

        string[] inputfiles = Directory.GetFiles(samplesfolder);
        List<double> analysistimes = new List<double>();
        foreach (var fl in inputfiles)
        {

            //Emgu.CV.Image<Emgu.CV.Structure.Bgr, byte> Img = new Emgu.CV.Image<Emgu.CV.Structure.Bgr, byte>(fl);
            // Img.ROI = JsonConvert.DeserializeObject<Rectangle>("\"450, 288, 420, 1478\"");
            // string savePath = @"C:\ImageMLProjects\Tresseme200Ml Soiling Experiment\Tresseme200MlImages\ROIApplied\Bad\" + Path.GetFileName(fl);
            // Img.Save(savePath);

            //Bitmap bitmap = Emgu.CV.BitmapExtension.ToBitmap(Img); // your source of a bitmap
            Bitmap bitmap = new Bitmap(fl);
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var res =  GetModelPrediction(bitmap, out string error);

            sw.Stop();
            PrintResultsonConsole(res, Path.GetFileName(fl));

            Console.WriteLine($"Analysis Time(ms): {sw.ElapsedMilliseconds}");
            analysistimes.Add(sw.ElapsedMilliseconds);

        }

        if(analysistimes.Count()>0)
            Console.WriteLine($"Average Analysis Time(ms): {analysistimes.Average()}");
    }

    public static ModelMetadataPropertiesClass LoadProperties(string MetadatePropertiesFilepath, out string error)
    {
        string propertiesText = File.ReadAllText(MetadatePropertiesFilepath);
        error = "";
        ModelMetadataPropertiesClass mtp = new ModelMetadataPropertiesClass();

        try
        {
            mtp = JsonConvert.DeserializeObject<ModelMetadataPropertiesClass>(propertiesText);
        }
        catch (Exception ex)
        {
            error = ex.ToString();
            mtp = null;
        }

        return mtp;
    }
    public class ModelMetadataPropertiesClass
    {
        [JsonProperty("CustomVision.Metadata.AdditionalModelInfo")]
        public string CustomVisionMetadataAdditionalModelInfo { get; set; }

        [JsonProperty("CustomVision.Metadata.Version")]
        public string CustomVisionMetadataVersion { get; set; }

        [JsonProperty("CustomVision.Postprocess.Method")]
        public string CustomVisionPostprocessMethod { get; set; }

        [JsonProperty("CustomVision.Postprocess.Yolo.Biases")]
        public string CustomVisionPostprocessYoloBiases { get; set; }

        [JsonProperty("CustomVision.Postprocess.Yolo.NmsThreshold")]
        public string CustomVisionPostprocessYoloNmsThreshold { get; set; }

        [JsonProperty("CustomVision.Preprocess.CropHeight")]
        public string CustomVisionPreprocessCropHeight { get; set; }

        [JsonProperty("CustomVision.Preprocess.CropMethod")]
        public string CustomVisionPreprocessCropMethod { get; set; }

        [JsonProperty("CustomVision.Preprocess.CropWidth")]
        public string CustomVisionPreprocessCropWidth { get; set; }

        [JsonProperty("CustomVision.Preprocess.MaxDimension")]
        public string CustomVisionPreprocessMaxDimension { get; set; }

        [JsonProperty("CustomVision.Preprocess.MaxScale")]
        public string CustomVisionPreprocessMaxScale { get; set; }

        [JsonProperty("CustomVision.Preprocess.MinDimension")]
        public string CustomVisionPreprocessMinDimension { get; set; }

        [JsonProperty("CustomVision.Preprocess.MinScale")]
        public string CustomVisionPreprocessMinScale { get; set; }

        [JsonProperty("CustomVision.Preprocess.NormalizeMean")]
        public string CustomVisionPreprocessNormalizeMean { get; set; }

        [JsonProperty("CustomVision.Preprocess.NormalizeStd")]
        public string CustomVisionPreprocessNormalizeStd { get; set; }

        [JsonProperty("CustomVision.Preprocess.ResizeMethod")]
        public string CustomVisionPreprocessResizeMethod { get; set; }

        [JsonProperty("CustomVision.Preprocess.TargetHeight")]
        public int CustomVisionPreprocessTargetHeight { get; set; }

        [JsonProperty("CustomVision.Preprocess.TargetWidth")]
        public int CustomVisionPreprocessTargetWidth { get; set; }

        [JsonProperty("Image.BitmapPixelFormat")]
        public string ImageBitmapPixelFormat { get; set; }

        [JsonProperty("Image.ColorSpaceGamma")]
        public string ImageColorSpaceGamma { get; set; }

        [JsonProperty("Image.NominalPixelRange")]
        public string ImageNominalPixelRange { get; set; }
    }

    public static void PrintResultsonConsole( PredictionResultClass pr,string  filePath)
    {
        var defaultForeground = Console.ForegroundColor;
        var labelColor = ConsoleColor.Magenta;
        var probColor = ConsoleColor.Blue;
        var exactLabel = ConsoleColor.Green;
        var failLabel = ConsoleColor.Red;

        Console.Write("ImagePath: ");
        Console.ForegroundColor = labelColor;
        Console.Write($"{Path.GetFileName(filePath)}");
        Console.ForegroundColor = defaultForeground;

        Console.ForegroundColor = defaultForeground;
        Console.Write(" predicted as ");
        Console.ForegroundColor = exactLabel;
        Console.Write($"{pr.Label}");

        Console.ForegroundColor = defaultForeground;
        Console.Write(" with probability ");
        Console.ForegroundColor = probColor;
        Console.Write(pr.probability);
        Console.ForegroundColor = defaultForeground;
        Console.WriteLine("");
    }
}

To execute mass execution, I use the following code:

static void Main(string[] args)
{
    var assetsPath = ModelHelpers.GetAssetsPath(@"..\..\..\assets");

    var tagsTsv = Path.Combine(assetsPath, "inputs", "images", "tags.tsv");
    var imagesFolder = Path.Combine(@"C:\ImageMLProjects\Tresseme200Ml Soiling Experiment\Tresseme200MlImages\ROIApplied\Good");
    var inceptionPb = Path.Combine(assetsPath, "inputs", "custom-vision-tensorflow", "model.pb");
    var labelsTxt = Path.Combine(assetsPath, "inputs", "custom-vision-tensorflow", "labels.txt");
    var MetadataPropertiesFilePath = Path.Combine(assetsPath, "inputs", "custom-vision-tensorflow", "metadata_properties.json");

    var onnxModelScorer = new OnnxModelScorer();

    onnxModelScorer.SetupPredictionEngine(@"C:\ImageMLProjects\Tresseme200Ml Soiling Experiment\ModelV2IT1ONNX\",out string error);
    onnxModelScorer.ModelMassTest(@"C:\ImageMLProjects\Tresseme200Ml Soiling Experiment\Tresseme200MlImages\ROIApplied\Bad");

    ConsoleHelpers.ConsolePressAnyKey();
}

Expected behavior When utilizing the Onnxruntime package, the average inferencing time is ~40ms, with Onnxruntime.DirectML I expected it to be less than 10ms

Screenshots NA

Additional context This is a performance oriented question, on how well Onnxruntime.DirectML allows .NET developers to exploit benefits of faster inferencing using GPU.

fdwr commented 2 years ago

@noumanqaiser: Thanks for sharing code. It's the .onnx model file and GPU which have the biggest impact on perf, much more so than the API (e.g. .NET). What opset does the model use? DML supports up to opset 12 currently (https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html), but we have work to update that. If you don't already know, Netron can help https://netron.app/.

noumanqaiser commented 2 years ago

Hi @fdwr I checked the onnx model properties and it shows the following:

I am not sure if the format is the opset you are referring to. This is the default format that Microsoft Custom Vision produces. If the performance can be improved by using a specific format, Is there a possibility to convert this model to a specific opset to improve inference performance?

fdwr commented 2 years ago

@noumanqaiser:

if the format is the opset you are referring to.

Yep, that's it, and this model uses ONNX operator set 10 (<= 12), ruling out one common problem we've seen recently where models are exported from various frameworks using opset 13, which causes fallback to the CPU. If this was WinML or ONNX Runtime, I'd recommend setting the input tensor size in the SessionOptions (WinML LearningModelSessionOptions.OverrideNamedDimension or ORT AddFreeDimensionOverride), but I'm not seeing any familiar API's above, and so this must all be going through "Microsoft CustomVision" which I'm not familiar with and don't know what it's calling under the hood. 🤔 This will take some research and asking around...

noumanqaiser commented 2 years ago

@fdwr Just wanted to check if there is an update regarding this, is there something I could do to get the CustomVision exported ONNX model to utilize GPU via DirectML.

If it would help, I would be happy to share the project/trained model file/sample images seperately.

Looking forward to hearing from you.

fdwr commented 2 years ago

@noumanqaiser - that would be most useful, having the resulting .onnx model file and the inputs that could be fed directly into ONNX Runtime. I'm not familiar with it, but it sounds like there is an "Export" button in the performance, glancing here: https://docs.microsoft.com/en-us/samples/azure-samples/cognitive-services-onnx-customvision-sample/cognitive-services-onnx-customvision-sample/.

noumanqaiser commented 2 years ago

I have the onnx file, training set images, and sample c# project to run inferencing with me, what would be the best way to share these with you(if possible privately).

fdwr commented 2 years ago

@noumanqaiser - I can either send a link of a OneDrive business folder to your email, or you could send a link to mine (dwayner at ms)

noumanqaiser commented 2 years ago

@fdwr I have shared with you a .Net project with the actual ONNX model and sample Images. The project runs mass inferencing and measures the average time for each inference.

https://drive.google.com/drive/folders/1DqnUvTaU9xp2QLuV_X9jFCjkratckMYL?usp=sharing

Looking forward to hear from you.

fdwr commented 2 years ago

[just update] Hi Nouman, I'm back from vacation and hopefully can look hopefully this week. TY.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

nums11 commented 1 year ago

Closing as stale.

microsoft / onnxruntime

No performance gain from DirectML in C# #9706