vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.68k stars 421 forks source link

[Issue]: Hangs on load Starting [AMD/ROCM] #2544

Closed philippludwig closed 11 months ago

philippludwig commented 11 months ago

Issue Description

While SD.Next works fine when running on the CPU, I cannot use it with my AMD card. It is stuck on

screenshot-1700771092

It stays like this for several minutes; within that timespan, the CPU can generate several images.

Sadly there is no useful info in the logfile. Here it is anyway: sdnext.log

I have tried --reinstall and to delete the folder, clone it again and install again. Same issue, sadly.

Version Platform Description

Gentoo, Python 3.11, AMD Radon 8600 XT, Firefox

Relevant log output

2023-11-23 21:20:52,875 | sd | INFO | launch | Starting SD.Next
2023-11-23 21:20:52,878 | sd | INFO | installer | Python 3.11.5 on Linux
2023-11-23 21:20:52,971 | sd | INFO | installer | Version: app=sd.next updated=2023-11-23 hash=88aabcb3 url=https://github.com/vladmandic/automatic/tree/master
2023-11-23 21:20:53,387 | sd | INFO | installer | Latest published version: 36562509f0c2bceee386ccd1ea14ada51e89e2fa 2023-11-23T19:38:46Z
2023-11-23 21:20:53,394 | sd | INFO | launch | Platform: arch=x86_64 cpu=Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz system=Linux release=6.1.57-gentoo python=3.11.5
2023-11-23 21:20:53,395 | sd | DEBUG | installer | Setting environment tuning
2023-11-23 21:20:53,396 | sd | DEBUG | installer | Cache folder: /home/philipp/.cache/huggingface/hub
2023-11-23 21:20:53,396 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=True ipex=False diml=False openvino=False
2023-11-23 21:20:53,397 | sd | DEBUG | installer | Torch allowed: cuda=False rocm=True ipex=False diml=False openvino=False
2023-11-23 21:20:53,399 | sd | INFO | installer | AMD ROCm toolkit detected
2023-11-23 21:20:53,425 | sd | DEBUG | installer | ROCm agents detected: ['gfx1101']
2023-11-23 21:20:53,426 | sd | DEBUG | installer | ROCm agent used by default: idx=0 gpu=gfx1101 arch=navi3x
2023-11-23 21:20:53,598 | sd | DEBUG | installer | ROCm version detected: 5.7
2023-11-23 21:20:53,866 | sd | DEBUG | installer | Repository update time: Thu Nov 23 16:45:40 2023
2023-11-23 21:20:53,870 | sd | INFO | launch | Startup: standard
2023-11-23 21:20:53,874 | sd | INFO | installer | Verifying requirements
2023-11-23 21:20:53,912 | sd | INFO | installer | Verifying packages
2023-11-23 21:20:53,913 | sd | INFO | installer | Verifying submodules
2023-11-23 21:20:54,609 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-extension-chainner / main
2023-11-23 21:20:54,634 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-extension-system-info / main
2023-11-23 21:20:54,657 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-agent-scheduler / main
2023-11-23 21:20:54,666 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-controlnet / main
2023-11-23 21:20:54,673 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-images-browser / main
2023-11-23 21:20:54,678 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
2023-11-23 21:20:54,682 | sd | DEBUG | installer | Submodule: modules/k-diffusion / master
2023-11-23 21:20:54,686 | sd | DEBUG | installer | Submodule: modules/lora / main
2023-11-23 21:20:54,691 | sd | DEBUG | installer | Submodule: wiki / master
2023-11-23 21:20:54,694 | sd | DEBUG | paths | Register paths
2023-11-23 21:20:54,724 | sd | DEBUG | installer | Installed packages: 213
2023-11-23 21:20:54,725 | sd | DEBUG | installer | Extensions all: ['stable-diffusion-webui-images-browser', 'sd-webui-agent-scheduler', 'sd-extension-system-info', 'sd-webui-controlnet', 'Lora', 'sd-extension-chainner', 'stable-diffusion-webui-rembg']
2023-11-23 21:20:54,726 | sd | DEBUG | installer | Running extension installer: /mnt/data/install/automatic/extensions-builtin/stable-diffusion-webui-images-browser/install.py
2023-11-23 21:20:54,878 | sd | DEBUG | installer | Running extension installer: /mnt/data/install/automatic/extensions-builtin/sd-webui-agent-scheduler/install.py
2023-11-23 21:20:55,032 | sd | DEBUG | installer | Running extension installer: /mnt/data/install/automatic/extensions-builtin/sd-extension-system-info/install.py
2023-11-23 21:20:55,171 | sd | DEBUG | installer | Running extension installer: /mnt/data/install/automatic/extensions-builtin/sd-webui-controlnet/install.py
2023-11-23 21:20:55,359 | sd | DEBUG | installer | Running extension installer: /mnt/data/install/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
2023-11-23 21:20:55,495 | sd | DEBUG | installer | Extensions all: []
2023-11-23 21:20:55,496 | sd | INFO | installer | Extensions enabled: ['stable-diffusion-webui-images-browser', 'sd-webui-agent-scheduler', 'sd-extension-system-info', 'sd-webui-controlnet', 'Lora', 'sd-extension-chainner', 'stable-diffusion-webui-rembg']
2023-11-23 21:20:55,496 | sd | INFO | installer | Verifying requirements
2023-11-23 21:20:55,500 | sd | DEBUG | launch | Setup complete without errors: 1700770856
2023-11-23 21:20:55,532 | sd | INFO | installer | Extension preload: {'extensions-builtin': 0.01, 'extensions': 0.0}
2023-11-23 21:20:55,533 | sd | DEBUG | launch | Starting module: <module 'webui' from '/mnt/data/install/automatic/webui.py'>
2023-11-23 21:20:55,534 | sd | INFO | launch | Command line args: ['--use-rocm', '--debug'] debug=True use_rocm=True
2023-11-23 21:21:41,298 | sd | INFO | loader | Load packages: torch=2.2.0.dev20231123+rocm5.7 diffusers=0.23.1 gradio=3.43.2
2023-11-23 21:21:42,916 | sd | DEBUG | shared | Read: file="config.json" json=14 bytes=539
2023-11-23 21:21:42,920 | sd | INFO | shared | Engine: backend=Backend.ORIGINAL compute=rocm mode=no_grad device=cuda cross-optimization="Sub-quadratic"
2023-11-23 21:21:42,964 | sd | INFO | shared | Device: device=AMD Radeon Graphics n=1 hip=5.7.31921-d1770ee1b
2023-11-23 21:22:04,929 | sd | DEBUG | webui | Entering start sequence
2023-11-23 21:22:04,933 | sd | DEBUG | webui | Initializing
2023-11-23 21:22:04,936 | sd | INFO | sd_vae | Available VAEs: path="models/VAE" items=0
2023-11-23 21:22:04,938 | sd | INFO | shared | Disabling uncompatible extensions: backend=Backend.ORIGINAL []
2023-11-23 21:22:04,946 | sd | DEBUG | shared | Read: file="cache.json" json=1 bytes=184
2023-11-23 21:22:04,954 | sd | DEBUG | shared | Read: file="metadata.json" json=1 bytes=95
2023-11-23 21:22:04,956 | sd | INFO | sd_models | Available models: path="models/Stable-diffusion" items=1 time=0.02
2023-11-23 21:22:06,685 | sd | DEBUG | webui | Load extensions
2023-11-23 21:22:09,322 | sd | INFO | script_loading | Extension: script='extensions-builtin/sd-webui-agent-scheduler/scripts/task_scheduler.py' Using sqlite file: extensions-builtin/sd-webui-agent-scheduler/task_scheduler.sqlite3
2023-11-23 21:22:10,305 | sd | INFO | script_loading | Extension: script='extensions-builtin/sd-webui-controlnet/scripts/controlnet.py' Warning: ControlNet failed to load SGM - will use LDM instead.
2023-11-23 21:22:10,306 | sd | INFO | script_loading | Extension: script='extensions-builtin/sd-webui-controlnet/scripts/controlnet.py' ControlNet preprocessor location: /mnt/data/install/automatic/extensions-builtin/sd-webui-controlnet/annotator/downloads
2023-11-23 21:22:10,311 | sd | INFO | script_loading | Extension: script='extensions-builtin/sd-webui-controlnet/scripts/hook.py' Warning: ControlNet failed to load SGM - will use LDM instead.
2023-11-23 21:22:15,890 | sd | INFO | webui | Extensions time: 9.20 { Lora=0.41 sd-extension-chainner=0.24 sd-extension-system-info=0.06 sd-webui-agent-scheduler=1.90 sd-webui-controlnet=1.03 stable-diffusion-webui-images-browser=0.13 stable-diffusion-webui-rembg=5.41 }
2023-11-23 21:22:16,031 | sd | DEBUG | shared | Read: file="html/upscalers.json" json=4 bytes=2640
2023-11-23 21:22:16,045 | sd | DEBUG | shared | Read: file="extensions-builtin/sd-extension-chainner/models.json" json=24 bytes=2693
2023-11-23 21:22:16,047 | sd | DEBUG | chainner_model | chaiNNer models: path="models/chaiNNer" defined=24 discovered=0 downloaded=0
2023-11-23 21:22:16,048 | sd | DEBUG | modelloader | Load upscalers: total=50 downloaded=0 user=0 ['None', 'Lanczos', 'Nearest', 'ChaiNNer', 'RealESRGAN', 'SD', 'ESRGAN', 'SwinIR', 'SCUNet', 'LDSR']
2023-11-23 21:22:16,072 | sd | DEBUG | styles | Load styles: folder="models/styles" items=288
2023-11-23 21:22:16,075 | sd | DEBUG | webui | Creating UI
2023-11-23 21:22:16,201 | sd | INFO | theme | Load UI theme: name="black-teal" style=Auto base=sdnext.css
2023-11-23 21:22:16,263 | sd | DEBUG | ui_extra_networks | Extra networks: page='model' items=1 subfolders=1 tab=txt2img folders=['models/Stable-diffusion', 'models/Diffusers', 'models/Reference', '/mnt/data/install/automatic/models/Stable-diffusion'] list=0.00 desc=0.00 info=0.00
2023-11-23 21:22:16,272 | sd | DEBUG | ui_extra_networks | Extra networks: page='style' items=288 subfolders=2 tab=txt2img folders=['models/styles', 'html'] list=0.01 desc=0.00 info=0.00
2023-11-23 21:22:16,273 | sd | DEBUG | ui_extra_networks | Extra networks: page='embedding' items=0 subfolders=1 tab=txt2img folders=['models/embeddings'] list=0.00 desc=0.00 info=0.00
2023-11-23 21:22:16,274 | sd | DEBUG | ui_extra_networks | Extra networks: page='hypernetwork' items=0 subfolders=1 tab=txt2img folders=['models/hypernetworks'] list=0.00 desc=0.00 info=0.00
2023-11-23 21:22:16,276 | sd | DEBUG | ui_extra_networks | Extra networks: page='vae' items=0 subfolders=1 tab=txt2img folders=['models/VAE'] list=0.00 desc=0.00 info=0.00
2023-11-23 21:22:16,277 | sd | DEBUG | ui_extra_networks | Extra networks: page='lora' items=0 subfolders=1 tab=txt2img folders=['models/Lora', 'models/LyCORIS'] list=0.00 desc=0.00 info=0.00
2023-11-23 21:22:16,748 | sd | DEBUG | shared | Read: file="ui-config.json" json=0 bytes=2
2023-11-23 21:22:17,387 | sd | DEBUG | theme | Themes: builtin=6 default=5 external=55
2023-11-23 21:22:17,758 | sd | DEBUG | script_callbacks | Script: 0.25 ui_tabs /mnt/data/install/automatic/extensions-builtin/stable-diffusion-webui-images-browser/scripts/image_browser.py
2023-11-23 21:22:17,762 | sd | ERROR | extensions | Failed reading extension data from Git repository: sd-extension-chainner: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/sd-extension-chainner/description'
2023-11-23 21:22:17,764 | sd | ERROR | extensions | Failed reading extension data from Git repository: sd-extension-system-info: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/sd-extension-system-info/description'
2023-11-23 21:22:17,765 | sd | ERROR | extensions | Failed reading extension data from Git repository: sd-webui-agent-scheduler: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/sd-webui-agent-scheduler/description'
2023-11-23 21:22:17,767 | sd | ERROR | extensions | Failed reading extension data from Git repository: sd-webui-controlnet: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/sd-webui-controlnet/description'
2023-11-23 21:22:17,768 | sd | ERROR | extensions | Failed reading extension data from Git repository: stable-diffusion-webui-images-browser: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/stable-diffusion-webui-images-browser/description'
2023-11-23 21:22:17,770 | sd | ERROR | extensions | Failed reading extension data from Git repository: stable-diffusion-webui-rembg: [Errno 2] No such file or directory: '/mnt/data/install/automatic/.git/modules/extensions-builtin/stable-diffusion-webui-rembg/description'
2023-11-23 21:22:17,771 | sd | DEBUG | ui_extensions | Extension list: processed=7 installed=7 enabled=7 disabled=0 visible=7 hidden=0
2023-11-23 21:22:18,240 | sd | INFO | webui | Local URL: http://127.0.0.1:7860/
2023-11-23 21:22:18,241 | sd | DEBUG | webui | Gradio functions: registered=2079
2023-11-23 21:22:18,241 | sd | INFO | middleware | Initializing middleware
2023-11-23 21:22:18,252 | sd | DEBUG | webui | Creating API
2023-11-23 21:22:18,472 | sd | INFO | task_runner | [AgentScheduler] Task queue is empty
2023-11-23 21:22:18,473 | sd | INFO | api | [AgentScheduler] Registering APIs
2023-11-23 21:22:18,793 | sd | DEBUG | script_callbacks | Script: 0.3 app_started /mnt/data/install/automatic/extensions-builtin/sd-webui-agent-scheduler/scripts/task_scheduler.py
2023-11-23 21:22:18,847 | sd | DEBUG | webui | Scripts setup: ['X/Y/Z Grid:0.006', 'ControlNet:0.068']
2023-11-23 21:22:18,848 | sd | DEBUG | sd_models | Model metadata: file="metadata.json" no changes
2023-11-23 21:22:18,849 | sd | DEBUG | webui | Model auto load disabled
2023-11-23 21:22:18,850 | sd | DEBUG | shared | Save: file="config.json" json=14 bytes=539
2023-11-23 21:22:18,850 | sd | INFO | webui | Startup time: 83.26 { torch=42.26 gradio=3.29 diffusers=0.15 libraries=23.63 extensions=9.20 face-restore=1.73 upscalers=0.16 ui-extra-networks=0.20 ui-txt2img=0.07 ui-img2img=0.10 ui-train=0.23 ui-settings=0.78 ui-extensions=0.29 ui-defaults=0.06 launch=0.41 api=0.10 app-started=0.50 }
2023-11-23 21:22:24,711 | sd | INFO | api | MOTD: N/A
2023-11-23 21:22:33,584 | sd | DEBUG | theme | Themes: builtin=6 default=5 external=55
2023-11-23 21:22:37,889 | sd | INFO | api | Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
2023-11-23 21:22:59,639 | sd | DEBUG | txt2img | txt2img: id_task=task(hkdpgpp3musrpef)|prompt=dog|negative_prompt=|prompt_styles=[]|steps=20|sampler_index=None|latent_index=None|full_quality=True|restore_faces=False|tiling=False|n_iter=1|batch_size=1|cfg_scale=6|clip_skip=1|seed=-1.0|subseed=-1.0|subseed_strength=0|seed_resize_from_h=0|seed_resize_from_w=0||height=512|width=512|enable_hr=False|denoising_strength=0.5|hr_scale=2|hr_upscaler=None|hr_force=False|hr_second_pass_steps=20|hr_resize_x=0|hr_resize_y=0|image_cfg_scale=6|diffusers_guidance_rescale=0.7|refiner_steps=5|refiner_start=0.8|refiner_prompt=|refiner_negative=|override_settings_texts=[]
2023-11-23 21:22:59,642 | sd | WARNING | sd_models | Selected checkpoint not found: v1-5-pruned-emaonly.safetensors
2023-11-23 21:22:59,642 | sd | INFO | sd_models | Select: model="v1-5-pruned-emaonly [6ce0161689]"
2023-11-23 21:22:59,644 | sd | DEBUG | sd_models | Load model weights: existing=False target=/mnt/data/install/automatic/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors info=None
2023-11-23 21:23:05,267 | sd | DEBUG | sd_models | Load model: name=/mnt/data/install/automatic/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors dict=True
2023-11-23 21:23:08,600 | sd | DEBUG | devices | Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False
2023-11-23 21:23:08,607 | sd | INFO | devices | Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16 context=no_grad fp16=True bf16=False
2023-11-23 21:23:08,611 | sd | DEBUG | sd_models | Model dict loaded: {'ram': {'used': 1.58, 'total': 31.28}, 'gpu': {'used': 0.34, 'total': 15.98}, 'retries': 0, 'oom': 0}
2023-11-23 21:23:08,661 | sd | DEBUG | sd_models | Model config loaded: {'ram': {'used': 1.58, 'total': 31.28}, 'gpu': {'used': 0.34, 'total': 15.98}, 'retries': 0, 'oom': 0}
2023-11-23 21:23:16,090 | sd | INFO | sd_models | LDM: LatentDiffusion: Running in eps-prediction mode
2023-11-23 21:23:16,091 | sd | INFO | sd_models | LDM: DiffusionWrapper has 859.52 M params.
2023-11-23 21:23:16,092 | sd | INFO | sd_models | LDM: LatentDiffusion: Running in eps-prediction mode
2023-11-23 21:23:16,093 | sd | INFO | sd_models | LDM: DiffusionWrapper has 859.52 M params.
2023-11-23 21:23:16,094 | sd | DEBUG | sd_models | Model created from config: /mnt/data/install/automatic/configs/v1-inference.yaml
2023-11-23 21:23:16,094 | sd | INFO | sd_models | Autodetect: model="Stable Diffusion" class=StableDiffusionPipeline file="/mnt/data/install/automatic/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors" size=4068MB
2023-11-23 21:23:16,095 | sd | DEBUG | sd_models | Model weights loading: {'ram': {'used': 6.4, 'total': 31.28}, 'gpu': {'used': 0.34, 'total': 15.98}, 'retries': 0, 'oom': 0}
2023-11-23 21:23:59,945 | sd | DEBUG | launch | Server: alive=True jobs=1 requests=254 uptime=137 memory=8.31/31.28 backend=Backend.ORIGINAL state=job="txt2img" 0/-1

Backend

Original

Branch

Master

Model

SD 1.5

Acknowledgements

vladmandic commented 11 months ago

Enter apprroate title and quote section of the log, not just link to it.

philippludwig commented 11 months ago

There you go.

tornado73 commented 11 months ago

AMD Radon 8600 XT (00) -) ROCm agents detected: ['gfx1101'] = RX 7800 XT

what version of pytorch ? pip list

vladmandic commented 11 months ago

first operation that exectues after the last message in your log is torch.load_state_dict - if that hangs, that is deep inside torch and not much i can do.

i'd suggest at least once trying a fresh reinstall to get fresh torch install on your system.

and you may look into possibly setting environment variable HSA_OVERRIDE_GFX_VERSION to appropriate version for your gpu (you'll need to search to find some suggestions, i don't have one)

other than that, i don't have much to suggest.

tornado73 commented 11 months ago

Stop... It's Gentoo... They need their own dances with a tambourine -) https://wiki.gentoo.org/wiki/ROCm

Put any of the supported ones Screenshot from 2023-11-24 19-23-40

philippludwig commented 11 months ago

Thank you. I will fiddle a bit around and see if I find a solution.

philippludwig commented 11 months ago

Allright, I reinstalled it again, updated drivers and now I only get

'sh webui.sh --debug --use-rocm…' terminated by signal SIGSEGV (Address boundary error)

So that’s the end of the debugging for me, because now I don’t even have logfiles. Anyway, thanks for the help.

vladmandic commented 11 months ago

That basically men's your driver crashed and that is much lower operation than sdnext could capture. Logs would be in Linux system logs, most likely in /var/logs