[Performance] How much memory it needs to load a 3.4 GB model to GPU through DirectML?

Describe the issue

I'm trying to load the Stable Diffusion ONNX model to GPU through DirectML in my Window ARM device: https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet

The source code:

    var sessionOptions = new SessionOptions();
    sessionOptions.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL;
    sessionOptions.EnableMemoryPattern = false;
    sessionOptions.AppendExecutionProvider_DML(0);
    sessionOptions.AppendExecutionProvider_CPU();

    var unetSession = new InferenceSession(UnetOnnxPath, sessionOptions);

Before I run this code, there're more than 7 GB free memory, but while I running this code, the memory is exhausted, then out of memory killed my app. Are there any solution for this issue?

To reproduce

Run the code I've shown.

Urgency

No response

Platform

Windows

OS Version

Version 22H2(OS Build 22621.2134)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Architecture

ARM64

Execution Provider

DirectML

Execution Provider Library Version

No response

Model File

https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet

Is this a quantized model?

Unknown

there're more than 7 GB free memory

I'm curious if that's GPU VRAM (video memory) or shared memory? You can tell from DxDiag.exe under the Display / Device. For example, the machine I'm typing from right now doesn't have enough for stable diffusion:

Either way, I generally recommend using the float16 ONNX model for stable diffusion, which should use half the memory and is often faster. You could convert the float32 model to float16 via a script like this:

# e.g. ConvertToFloat16.py "D:\ai-models\StableDiffusion\Stable-Diffusion-v1.5-unet.onnx"

import onnx
import os
import sys
from onnxconverter_common import float16

if len(sys.argv) <= 1:
    print("Pass an ONNX filename.")
    quit()
#endif

# Add a filename suffix of "float16".
filePath = sys.argv[1]
filePathSplitExtension = os.path.splitext(filePath)
filePathNoExtension = filePathSplitExtension[0]
fileNameExtension = filePathSplitExtension[1]
fileName = os.path.basename(filePathNoExtension)
fileSuffixSeparator = '-'
if ('_' in fileName) and not ('-' in fileName):
    fileSuffixSeparator = '_'
#endif
newFilePath = filePathNoExtension + fileSuffixSeparator + "float16" + fileNameExtension
newWeightsFilename = fileName + fileSuffixSeparator + "float16" + ".weights.pb"

print("Input file: ", filePath)
print("Output file:", newFilePath)

print("Loading input model")
model = onnx.load(filePath)
print("Applying shape inference")
onnx.shape_inference.infer_shapes_path(model_path = filePath, output_path = newFilePath)
print("Reloading input model with inferred shapes")
shapedModel = onnx.load(newFilePath)

print("Converting model to float16")
modelFloat16 = float16.convert_float_to_float16(shapedModel, keep_io_types=True, disable_shape_infer=False)

saveWeightsExternally = False

if saveWeightsExternally:
    print("Saving output model to " + newFilePath + " and " + newWeightsFilename)
else:
    print("Saving output model to " + newFilePath)
#endif

onnx.save_model(modelFloat16, newFilePath, save_as_external_data=saveWeightsExternally, all_tensors_to_one_file=True, location=newWeightsFilename)

Thanks @fdwr for your support!

I've just tried it again, I found 'Display Memory' is always '0' in the 'DxDiag' dialog while I running the code I've mentioned previously.

DxDiag

Before I run it, there is 10G free memory, but while I running it, the memory is changing to < 1 GB fastly. Then app crash due to our of memory with the info below. It seems there're some issues in the platform! It should not cost so much memory! Are there any way to debug it? I'm test it on Lenovo x13s device(ARM64).

StableDiffusion.exe
a fireplace in an old cabin in the woods
2023-09-06 12:43:24.0886341 [E:onnxruntime:, inference_session.cc:1533 onnxruntime::InferenceSession::Initialize::<lambda_7c4fa25391529f97c3fbc8cfdbaaaec0>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.

Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.

   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
   at StableDiffusion.ML.OnnxRuntime.UNet.Inference(String prompt, StableDiffusionConfig config) in C:\Source\SD\StableDiffusion\StableDiffusion.ML.OnnxRuntime\UNet.cs:line 75
   at StableDiffusion.Program.Main(String[] args) in C:\Source\SD\StableDiffusion\StableDiffusion\Program.cs:line 37

Before I run it, there is 10G free memory

10GB's of what exactly though, system RAM, shared display memory, dedicated VRAM?
Where are you seeing this number?
Is this with the float16 ONNX model?

I found 'Display Memory' is always '0' in the 'DxDiag' dialog

I've never seen 0 VRAM before, but 0 bytes is not nearly enough 😉. I generally wouldn't hold out hope that Stable Diffusion is going to run on a laptop anyway (unless maybe you have a gaming laptop). A discrete GPU with at least 8GB VRAM is your target (it won't even run on 2 of my 3 work desktops, and I had to buy a newer GPU for my personal desktop to run it).

microsoft / onnxruntime