microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.14k stars 2.85k forks source link

[Performance] How much memory it needs to load a 3.4 GB model to GPU through DirectML? #17413

Open zhanweiw opened 1 year ago

zhanweiw commented 1 year ago

Describe the issue

I'm trying to load the Stable Diffusion ONNX model to GPU through DirectML in my Window ARM device: https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet

The source code:

    var sessionOptions = new SessionOptions();
    sessionOptions.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL;
    sessionOptions.EnableMemoryPattern = false;
    sessionOptions.AppendExecutionProvider_DML(0);
    sessionOptions.AppendExecutionProvider_CPU();

    var unetSession = new InferenceSession(UnetOnnxPath, sessionOptions);

Before I run this code, there're more than 7 GB free memory, but while I running this code, the memory is exhausted, then out of memory killed my app. Are there any solution for this issue?

To reproduce

Run the code I've shown.

Urgency

No response

Platform

Windows

OS Version

Version 22H2(OS Build 22621.2134)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

C#

Architecture

ARM64

Execution Provider

DirectML

Execution Provider Library Version

No response

Model File

https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet

Is this a quantized model?

Unknown

fdwr commented 1 year ago

there're more than 7 GB free memory

I'm curious if that's GPU VRAM (video memory) or shared memory? You can tell from DxDiag.exe under the Display / Device. For example, the machine I'm typing from right now doesn't have enough for stable diffusion:

image

Either way, I generally recommend using the float16 ONNX model for stable diffusion, which should use half the memory and is often faster. You could convert the float32 model to float16 via a script like this:

# e.g. ConvertToFloat16.py "D:\ai-models\StableDiffusion\Stable-Diffusion-v1.5-unet.onnx"

import onnx
import os
import sys
from onnxconverter_common import float16

if len(sys.argv) <= 1:
    print("Pass an ONNX filename.")
    quit()
#endif

# Add a filename suffix of "float16".
filePath = sys.argv[1]
filePathSplitExtension = os.path.splitext(filePath)
filePathNoExtension = filePathSplitExtension[0]
fileNameExtension = filePathSplitExtension[1]
fileName = os.path.basename(filePathNoExtension)
fileSuffixSeparator = '-'
if ('_' in fileName) and not ('-' in fileName):
    fileSuffixSeparator = '_'
#endif
newFilePath = filePathNoExtension + fileSuffixSeparator + "float16" + fileNameExtension
newWeightsFilename = fileName + fileSuffixSeparator + "float16" + ".weights.pb"

print("Input file: ", filePath)
print("Output file:", newFilePath)

print("Loading input model")
model = onnx.load(filePath)
print("Applying shape inference")
onnx.shape_inference.infer_shapes_path(model_path = filePath, output_path = newFilePath)
print("Reloading input model with inferred shapes")
shapedModel = onnx.load(newFilePath)

print("Converting model to float16")
modelFloat16 = float16.convert_float_to_float16(shapedModel, keep_io_types=True, disable_shape_infer=False)

saveWeightsExternally = False

if saveWeightsExternally:
    print("Saving output model to " + newFilePath + " and " + newWeightsFilename)
else:
    print("Saving output model to " + newFilePath)
#endif

onnx.save_model(modelFloat16, newFilePath, save_as_external_data=saveWeightsExternally, all_tensors_to_one_file=True, location=newWeightsFilename)
zhanweiw commented 1 year ago

Thanks @fdwr for your support!

I've just tried it again, I found 'Display Memory' is always '0' in the 'DxDiag' dialog while I running the code I've mentioned previously.

DxDiag

Before I run it, there is 10G free memory, but while I running it, the memory is changing to < 1 GB fastly. Then app crash due to our of memory with the info below. It seems there're some issues in the platform! It should not cost so much memory! Are there any way to debug it? I'm test it on Lenovo x13s device(ARM64).

StableDiffusion.exe
a fireplace in an old cabin in the woods
2023-09-06 12:43:24.0886341 [E:onnxruntime:, inference_session.cc:1533 onnxruntime::InferenceSession::Initialize::<lambda_7c4fa25391529f97c3fbc8cfdbaaaec0>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.

Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.

   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
   at StableDiffusion.ML.OnnxRuntime.UNet.Inference(String prompt, StableDiffusionConfig config) in C:\Source\SD\StableDiffusion\StableDiffusion.ML.OnnxRuntime\UNet.cs:line 75
   at StableDiffusion.Program.Main(String[] args) in C:\Source\SD\StableDiffusion\StableDiffusion\Program.cs:line 37
fdwr commented 1 year ago

Before I run it, there is 10G free memory

I found 'Display Memory' is always '0' in the 'DxDiag' dialog

I've never seen 0 VRAM before, but 0 bytes is not nearly enough 😉. I generally wouldn't hold out hope that Stable Diffusion is going to run on a laptop anyway (unless maybe you have a gaming laptop). A discrete GPU with at least 8GB VRAM is your target (it won't even run on 2 of my 3 work desktops, and I had to buy a newer GPU for my personal desktop to run it).