vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.51k stars 402 forks source link

[Issue]: Сant run any generation 2 times bigger than base model native resolution. Artifacts line. Long gen. initialization. ARC A770 #3337

Closed Magenta-Flutist closed 2 months ago

Magenta-Flutist commented 2 months ago

Issue Description

My problem is I cant run any generation 2 times bigger than base model native resolution: 1024 for SD1.5 (928 for example too) and 2048 for SDXL. Upscalers works, but I dont prefer it due to additional artifacts. So, It my first usage of Intel ARC gpu with OpenVino and its my first test runs. I dont understand whats the point, cause I have no such problems on NVIDIA GPU. Maybe it`s not a problem or rather trivial problem, but I lack the competence to solve it anyway.

Another problems are: 1.long time initialization of every generation which makes total generation speed absolutely lame. 2.line of artifacts on images. I know that a way to fix it exist, but I cant find a link to it.

image (2)

Version Platform Description

Win10 latest, ARC A770 with latest stable driver 101.5762, SD.Next latest, Python 3.11, Brave Browser

Relevant log output

10:34:44-878092 INFO     Launching browser
10:34:47-549993 INFO     MOTD: N/A
10:34:50-225669 DEBUG    UI themes available: type=Standard themes=12
10:34:51-955584 INFO     Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
                         AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36
10:35:15-172865 INFO     Select: model="dreamshaper_8 [879db523c3]"
10:35:15-174859 DEBUG    Load model: existing=False
                         target=C:\Users\asdas\Desktop\automatic\models\Stable-diffusion\dreamshaper_8.safetensors
                         info=None
10:35:15-377041 DEBUG    GC: utilization={'gpu': 0, 'ram': 43, 'threshold': 80} gc={'collected': 1466, 'saved': 0}
                         before={'gpu': 0, 'ram': 13.71} after={'gpu': 0, 'ram': 13.71, 'retries': 0, 'oom': 0}
                         device=cpu fn=unload_model_weights time=0.2
10:35:15-383119 DEBUG    Unload weights model: {'ram': {'used': 13.71, 'total': 31.93}}
10:35:15-844536 DEBUG    Diffusers loading:
                         path="C:\Users\asdas\Desktop\automatic\models\Stable-diffusion\dreamshaper_8.safetensors"
10:35:15-846538 INFO     Autodetect: model="Stable Diffusion" class=StableDiffusionPipeline
                         file="C:\Users\asdas\Desktop\automatic\models\Stable-diffusion\dreamshaper_8.safetensors"
                         size=2034MB
Loading pipeline components... 100% ------------------------------------------------ 6/6  [ 0:00:02 < 0:00:00 , 1 C/s ]
10:35:18-850456 DEBUG    Setting model: pipeline=StableDiffusionPipeline config={'low_cpu_mem_usage': True,
                         'torch_dtype': torch.float32, 'load_connected_pipeline': True, 'extract_ema': False, 'config':
                         'configs/sd15', 'use_safetensors': True, 'cache_dir':
                         'C:\\Users\\asdas\\.cache\\huggingface\\hub'}
10:35:18-857461 INFO     Load embeddings: loaded=0 skipped=0 time=0.00
10:35:18-859463 DEBUG    Setting model: enable VAE slicing
10:35:18-880481 INFO     Model compile: pipeline=StableDiffusionPipeline mode=default backend=openvino_fx
                         fullgraph=False compile=['Model', 'VAE']
10:35:18-883485 DEBUG    Model compile available backends: ['cudagraphs', 'inductor', 'onnxrt', 'openvino_fx',
                         'openxla', 'openxla_eval', 'tvm']
10:35:18-900501 INFO     Model compile: time=0.02
10:35:19-104684 DEBUG    GC: utilization={'gpu': 0, 'ram': 33, 'threshold': 80} gc={'collected': 104, 'saved': 0}
                         before={'gpu': 0, 'ram': 10.61} after={'gpu': 0, 'ram': 10.61, 'retries': 0, 'oom': 0}
                         device=cpu fn=load_diffuser time=0.2
10:35:19-113693 INFO     Load model: time=3.06 load=3.01 native=512 {'ram': {'used': 10.61, 'total': 31.93}}
10:35:19-116695 DEBUG    Setting changed: sd_model_checkpoint=dreamshaper_8 [879db523c3] progress=True
10:35:19-117697 DEBUG    Save: file="config.json" json=32 bytes=1377 time=0.001
10:35:27-547323 INFO     Base: class=StableDiffusionPipeline
10:35:27-550326 DEBUG    Sampler: sampler="Euler a" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'rescale_betas_zero_snr': False, 'timestep_spacing': 'linspace'}
10:35:28-485705 DEBUG    Torch generator: device=cpu seeds=[1758501461]
10:35:28-487708 DEBUG    Diffuser pipeline: StableDiffusionPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'prompt_embeds': torch.Size([1, 77, 768]), 'negative_prompt_embeds': torch.Size([1, 77,
                         768]), 'guidance_scale': 6, 'num_inference_steps': 20, 'eta': 1.0, 'guidance_rescale': 0.7,
                         'output_type': 'latent', 'width': 1024, 'height': 1024, 'parser': 'Full parser'}
Progress ?it/s                                              0% 0/20 00:00 ? Base10:36:00-139138 DEBUG    Server: alive=True jobs=1 requests=356 uptime=91 memory=13.1/31.93 backend=Backend.DIFFUSERS
                         state=idle
Progress ?it/s                                              0% 0/20 01:05 ? Base
10:36:34-099404 ERROR    Processing: args={'prompt_embeds': tensor([[[-0.3826,  0.0188, -0.0631,  ..., -0.4869, -0.2943,  0.0626],
                                  [-0.0952, -1.3683,  0.3406,  ..., -0.6477,  1.1441,  0.6530],
                                  [ 0.5831, -0.6960, -0.4823,  ..., -0.8055, -1.0261, -1.4348],
                                  ...,
                                  [-0.5630,  0.4543,  0.0439,  ...,  0.5765, -0.7594, -0.4750],
                                  [-0.5772,  0.4717,  0.0423,  ...,  0.5898, -0.7722, -0.4663],
                                  [-0.5705,  0.4813,  0.1211,  ...,  0.6016, -0.7849, -0.4757]]]), 'negative_prompt_embeds': tensor([[[-0.3826,  0.0188, -0.0631,  ..., -0.4869, -0.2943,  0.0626],
                                  [-0.3221, -1.4899, -0.4587,  ...,  1.0639,  0.1340, -0.6948],
                                  [-0.3292, -1.4357, -0.4369,  ...,  1.1487, -0.0030, -0.5361],
                                  ...,
                                  [ 1.3220, -0.5007, -0.4426,  ...,  0.7232, -1.3899,  0.7433],
                                  [ 1.3292, -0.4887, -0.4393,  ...,  0.7425, -1.4021,  0.7457],
                                  [ 1.3397, -0.4578, -0.3970,  ...,  0.7448, -1.3316,  0.7183]]]), 'guidance_scale': 6, 'generator': [<torch._C.Generator object at 0x000002214F30EE10>],
                         'callback_on_step_end': <function diffusers_callback at 0x0000022144DC6DE0>, 'callback_on_step_end_tensor_inputs': ['latents', 'prompt_embeds', 'negative_prompt_embeds'],
                         'num_inference_steps': 20, 'eta': 1.0, 'guidance_rescale': 0.7, 'output_type': 'latent', 'width': 1024, 'height': 1024} Exception from src\inference\src\core.cpp:102:
                         [ GENERAL_ERROR ] [CL ext] Can not allocate nullptr for USM type.

10:36:34-117420 ERROR    Processing: RuntimeError
┌──────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────────────────────────────────────────────┐
│ C:\Users\asdas\Desktop\automatic\modules\processing_diffusers.py:122 in process_diffusers                                                                                                                  │
│                                                                                                                                                                                                            │
│   121 │   │   else:                                                                                                                                                                                        │
│ > 122 │   │   │   output = shared.sd_model(**base_args)                                                                                                                                                    │
│   123 │   │   if isinstance(output, dict):                                                                                                                                                                 │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\venv\Lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                                                                                                 │
│                                                                                                                                                                                                            │
│   114 │   │   with ctx_factory():                                                                                                                                                                          │
│ > 115 │   │   │   return func(*args, **kwargs)                                                                                                                                                             │
│   116                                                                                                                                                                                                      │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\venv\Lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1006 in __call__                                                                 │
│                                                                                                                                                                                                            │
│   1005 │   │   │   │   # predict the noise residual                                                                                                                                                        │
│ > 1006 │   │   │   │   noise_pred = self.unet(                                                                                                                                                             │
│   1007 │   │   │   │   │   latent_model_input,                                                                                                                                                             │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\venv\Lib\site-packages\torch\nn\modules\module.py:1511 in _wrapped_call_impl                                                                                              │
│                                                                                                                                                                                                            │
│   1510 │   │   else:                                                                                                                                                                                       │
│ > 1511 │   │   │   return self._call_impl(*args, **kwargs)                                                                                                                                                 │
│   1512                                                                                                                                                                                                     │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\venv\Lib\site-packages\torch\nn\modules\module.py:1520 in _call_impl                                                                                                      │
│                                                                                                                                                                                                            │
│   1519 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                                                                                                                             │
│ > 1520 │   │   │   return forward_call(*args, **kwargs)                                                                                                                                                    │
│   1521                                                                                                                                                                                                     │
│                                                                                                                                                                                                            │
│                                                                                          ... 19 frames hidden ...                                                                                          │
│ in forward:5                                                                                                                                                                                               │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\modules\intel\openvino\__init__.py:61 in __call__                                                                                                                         │
│                                                                                                                                                                                                            │
│    60 │   def __call__(self, *args):                                                                                                                                                                       │
│ >  61 │   │   result = openvino_execute(self.gm, *args, executor_parameters=self.executor_parameters, partition_id=self.partition_id, file_name=self.file_name)                                            │
│    62 │   │   return result                                                                                                                                                                                │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\modules\intel\openvino\__init__.py:328 in openvino_execute                                                                                                                │
│                                                                                                                                                                                                            │
│   327 │   │   else:                                                                                                                                                                                        │
│ > 328 │   │   │   compiled = openvino_compile(gm, *args, model_hash_str=model_hash_str, file_name=file_name)                                                                                               │
│   329 │   │   shared.compiled_model_state.compiled_cache[partition_id] = compiled                                                                                                                          │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\modules\intel\openvino\__init__.py:256 in openvino_compile                                                                                                                │
│                                                                                                                                                                                                            │
│   255 │                                                                                                                                                                                                    │
│ > 256 │   compiled_model = core.compile_model(om, device)                                                                                                                                                  │
│   257 │   return compiled_model                                                                                                                                                                            │
│                                                                                                                                                                                                            │
│ C:\Users\asdas\Desktop\automatic\venv\Lib\site-packages\openvino\runtime\ie_api.py:547 in compile_model                                                                                                    │
│                                                                                                                                                                                                            │
│   546 │   │   return CompiledModel(                                                                                                                                                                        │
│ > 547 │   │   │   super().compile_model(model, device_name, {} if config is None else config),                                                                                                             │
│   548 │   │   )                                                                                                                                                                                            │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
RuntimeError: Exception from src\inference\src\core.cpp:102:
[ GENERAL_ERROR ] [CL ext] Can not allocate nullptr for USM type.

10:36:34-711471 INFO     Processed: images=0 time=67.18 its=0.00 memory={'ram': {'used': 6.02, 'total': 31.93}}
10:38:00-197861 DEBUG    Server: alive=True jobs=1 requests=418 uptime=211 memory=6.01/31.93 backend=Backend.DIFFUSERS state=idle
10:40:00-236173 DEBUG    Server: alive=True jobs=1 requests=420 uptime=331 memory=6.01/31.93 backend=Backend.DIFFUSERS state=idle
10:42:00-271589 DEBUG    Server: alive=True jobs=1 requests=429 uptime=451 memory=6.01/31.93 backend=Backend.DIFFUSERS state=idle

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

Magenta-Flutist commented 2 months ago

Complete reinstall and all cashed data of previous webui interfaces and NVIDIA drivers removing solved a bunch of problems, sry. But defect stripe still here

Disty0 commented 2 months ago

1024x1024 is cursed on ARC with Windows. Use 1080x1080 or use Linux. Also there is pretty much no reason to use OpenVINO with ARC, use IPEX instead.

1.long time initialization of every generation which makes total generation speed absolutely lame.

That's what you get with hard compile backends like OpenVINO and TensorRT. Use IPEX (PyTorch) instead.