lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.79k stars 185 forks source link

[Bug]: using olive optimized and run it very slow with batch_count 30 #361

Closed Jay19751103 closed 7 months ago

Jay19751103 commented 8 months ago

Checklist

What happened?

When use directml with olive optimized model Just change batch_count from 1 to 30 Everytime it will reload the model, the speed by using olive will around 41s but using this webui will take around 128 to 129 seconds with batch count 30 (I tried 3080TI or 7900XTX, it's same behavior) The reason is that every time it will reload the model. Could you change it as olive ? Olive use a loop the run inference only by num_image parameter

Steps to reproduce the problem

  1. use this ui to olive optimize model
  2. set batch count to 30
  3. generate image slow.

What should have happened?

Inference time should be shorter to improve batch count setting

What browsers do you use to access the UI ?

No response

Sysinfo

GPU card : 3080TI or 7900XTX (VRAM 24G) CPU : AMD 7700X System memory : 32G

Console logs

Console log does not have any abnormal. it's just related with running speed.
Modifying test as following

D:\DirectML\stable-diffusion-webui-directml>git diff
diff --git a/models/OliveCache/.placeholder b/models/OliveCache/.placeholder
deleted file mode 100644
index e69de29b..00000000
diff --git a/modules/sd_onnx.py b/modules/sd_onnx.py
index fc618ec7..5adbcf14 100644
--- a/modules/sd_onnx.py
+++ b/modules/sd_onnx.py
@@ -93,6 +93,8 @@ class BaseONNXModel(Generic[T2I, I2I, INP], WebuiSdModel, metaclass=ABCMeta):
         }

     def load_orm(self, submodel: str) -> Union[diffusers.OnnxRuntimeModel, None]:
+        print("load model")
+        ort.set_default_logger_severity(3)
         path = self.path / submodel
         if not self.sd_checkpoint_info.is_optimized and self.sd_checkpoint_info.optimized_model_info is not None:
             path = self.sd_checkpoint_info.optimized_model_info[submodel]["optimized"]["path"].parent
diff --git a/modules/sd_onnx_models.py b/modules/sd_onnx_models.py
index 9751a0de..f02e5348 100644
--- a/modules/sd_onnx_models.py
+++ b/modules/sd_onnx_models.py
@@ -26,23 +26,27 @@ class ONNXStableDiffusionModel(
         self.add_free_dimension_override_by_name(
             "unet_hidden_sequence", 77
         )
+        self.init_flag = 0

     def create_txt2img_pipeline(
         self, sampler: SamplerData
     ) -> OnnxStableDiffusionPipeline:
-        return OnnxStableDiffusionPipeline(
-            safety_checker=None,
-            text_encoder=self.load_orm("text_encoder"),
-            unet=self.load_orm("unet"),
-            vae_decoder=self.load_orm("vae_decoder"),
-            vae_encoder=self.load_orm("vae_encoder"),
-            tokenizer=self.load_tokenizer("tokenizer"),
-            scheduler=sampler.constructor.from_pretrained(
-                self.path, subfolder="scheduler"
-            ),
-            feature_extractor=self.load_image_processor("feature_extractor"),
-            requires_safety_checker=False,
-        )
+        if self.init_flag == 0 :
+            self.pipeline = OnnxStableDiffusionPipeline(
+                safety_checker=None,
+                text_encoder=self.load_orm("text_encoder"),
+                unet=self.load_orm("unet"),
+                vae_decoder=self.load_orm("vae_decoder"),
+                vae_encoder=self.load_orm("vae_encoder"),
+                tokenizer=self.load_tokenizer("tokenizer"),
+                scheduler=sampler.constructor.from_pretrained(
+                    self.path, subfolder="scheduler"
+                ),
+                feature_extractor=self.load_image_processor("feature_extractor"),
+                requires_safety_checker=False,
+            )
+            self.init_flag = 1
+        return self.pipeline

     def create_img2img_pipeline(
         self, sampler: SamplerData

Additional information

No response

Jay19751103 commented 8 months ago

Hi lshqqytiger

Is it possible without reload for every batch inference ?

lshqqytiger commented 8 months ago

Yes. But it will use more vram. I'll refactor current codebase of this repository based on my new implementation. Before that, please consider using olive branch of vladmandic/automatic which is much developed.