Open zhanweiw opened 1 year ago
there're more than 7 GB free memory
I'm curious if that's GPU VRAM (video memory) or shared memory? You can tell from DxDiag.exe under the Display / Device. For example, the machine I'm typing from right now doesn't have enough for stable diffusion:
Either way, I generally recommend using the float16 ONNX model for stable diffusion, which should use half the memory and is often faster. You could convert the float32 model to float16 via a script like this:
# e.g. ConvertToFloat16.py "D:\ai-models\StableDiffusion\Stable-Diffusion-v1.5-unet.onnx"
import onnx
import os
import sys
from onnxconverter_common import float16
if len(sys.argv) <= 1:
print("Pass an ONNX filename.")
quit()
#endif
# Add a filename suffix of "float16".
filePath = sys.argv[1]
filePathSplitExtension = os.path.splitext(filePath)
filePathNoExtension = filePathSplitExtension[0]
fileNameExtension = filePathSplitExtension[1]
fileName = os.path.basename(filePathNoExtension)
fileSuffixSeparator = '-'
if ('_' in fileName) and not ('-' in fileName):
fileSuffixSeparator = '_'
#endif
newFilePath = filePathNoExtension + fileSuffixSeparator + "float16" + fileNameExtension
newWeightsFilename = fileName + fileSuffixSeparator + "float16" + ".weights.pb"
print("Input file: ", filePath)
print("Output file:", newFilePath)
print("Loading input model")
model = onnx.load(filePath)
print("Applying shape inference")
onnx.shape_inference.infer_shapes_path(model_path = filePath, output_path = newFilePath)
print("Reloading input model with inferred shapes")
shapedModel = onnx.load(newFilePath)
print("Converting model to float16")
modelFloat16 = float16.convert_float_to_float16(shapedModel, keep_io_types=True, disable_shape_infer=False)
saveWeightsExternally = False
if saveWeightsExternally:
print("Saving output model to " + newFilePath + " and " + newWeightsFilename)
else:
print("Saving output model to " + newFilePath)
#endif
onnx.save_model(modelFloat16, newFilePath, save_as_external_data=saveWeightsExternally, all_tensors_to_one_file=True, location=newWeightsFilename)
Thanks @fdwr for your support!
I've just tried it again, I found 'Display Memory' is always '0' in the 'DxDiag' dialog while I running the code I've mentioned previously.
Before I run it, there is 10G free memory, but while I running it, the memory is changing to < 1 GB fastly. Then app crash due to our of memory with the info below. It seems there're some issues in the platform! It should not cost so much memory! Are there any way to debug it? I'm test it on Lenovo x13s device(ARM64).
StableDiffusion.exe
a fireplace in an old cabin in the woods
2023-09-06 12:43:24.0886341 [E:onnxruntime:, inference_session.cc:1533 onnxruntime::InferenceSession::Initialize::<lambda_7c4fa25391529f97c3fbc8cfdbaaaec0>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.
Unhandled exception. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime.DLL!00007FFF75573D6C: (caller: 00007FFF75570248) Exception(2) tid(526c) 8007000E Not enough memory resources are available to complete this operation.
at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options)
at StableDiffusion.ML.OnnxRuntime.UNet.Inference(String prompt, StableDiffusionConfig config) in C:\Source\SD\StableDiffusion\StableDiffusion.ML.OnnxRuntime\UNet.cs:line 75
at StableDiffusion.Program.Main(String[] args) in C:\Source\SD\StableDiffusion\StableDiffusion\Program.cs:line 37
Before I run it, there is 10G free memory
I found 'Display Memory' is always '0' in the 'DxDiag' dialog
I've never seen 0 VRAM before, but 0 bytes is not nearly enough 😉. I generally wouldn't hold out hope that Stable Diffusion is going to run on a laptop anyway (unless maybe you have a gaming laptop). A discrete GPU with at least 8GB VRAM is your target (it won't even run on 2 of my 3 work desktops, and I had to buy a newer GPU for my personal desktop to run it).
Describe the issue
I'm trying to load the Stable Diffusion ONNX model to GPU through DirectML in my Window ARM device: https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet
The source code:
Before I run this code, there're more than 7 GB free memory, but while I running this code, the memory is exhausted, then out of memory killed my app. Are there any solution for this issue?
To reproduce
Run the code I've shown.
Urgency
No response
Platform
Windows
OS Version
Version 22H2(OS Build 22621.2134)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.14.1
ONNX Runtime API
C#
Architecture
ARM64
Execution Provider
DirectML
Execution Provider Library Version
No response
Model File
https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet
Is this a quantized model?
Unknown