vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.61k stars 411 forks source link

[Issue]: segmentation fault error on brand new install launch #2672

Closed GoDJr closed 9 months ago

GoDJr commented 9 months ago

Issue Description

My system is currently running Ubuntu 22.04 with kernel 6.2.12-t2-jammy. I have two GFX cards a AMD 560X and a AMD 6900 XT. this is my log.

Create and activate python venv Launching launch.py... 23:40:11-952648 INFO Starting SD.Next
23:40:11-955027 INFO Logger: file="/home/sagar/automatic/sdnext.log"
level=DEBUG size=64 mode=create
23:40:11-956045 INFO Python 3.11.5 on Linux
23:40:11-971319 INFO Version: app=sd.next updated=2023-12-30 hash=ab7b78cc
url=https://github.com/vladmandic/automatic/tree/master 23:40:12-281346 INFO Platform: arch=x86_64 cpu=x86_64 system=Linux
release=6.2.12-t2-jammy python=3.11.5
23:40:12-283592 DEBUG Setting environment tuning
23:40:12-285175 DEBUG Cache folder: /home/sagar/.cache/huggingface/hub
23:40:12-286685 DEBUG Torch overrides: cuda=False rocm=False ipex=False
diml=False openvino=False
23:40:12-288517 DEBUG Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
23:40:12-289832 INFO AMD ROCm toolkit detected
23:40:12-318755 DEBUG ROCm agents detected: ['gfx803', 'gfx1030']
23:40:12-319607 DEBUG ROCm agent used by default: idx=1 gpu=gfx1030
arch=navi2x
23:40:12-372204 DEBUG ROCm version detected: 5.7
23:40:12-385208 DEBUG Repository update time: Sat Dec 30 09:07:04 2023
23:40:12-387295 INFO Startup: standard
23:40:12-388690 INFO Verifying requirements
23:40:12-403959 INFO Verifying packages
23:40:12-406084 INFO Verifying submodules
23:40:12-649398 DEBUG Submodule: extensions-builtin/sd-extension-chainner /
main
23:40:12-660450 DEBUG Submodule: extensions-builtin/sd-extension-system-info / main
23:40:12-667678 DEBUG Submodule: extensions-builtin/sd-webui-agent-scheduler / main
23:40:12-673635 DEBUG Submodule: extensions-builtin/sd-webui-controlnet /
main
23:40:12-685607 DEBUG Submodule:
extensions-builtin/stable-diffusion-webui-images-browse r / main
23:40:12-690879 DEBUG Submodule:
extensions-builtin/stable-diffusion-webui-rembg /
master
23:40:12-695977 DEBUG Submodule: modules/k-diffusion / master
23:40:12-700796 DEBUG Submodule: modules/lora / main
23:40:12-705774 DEBUG Submodule: wiki / master
23:40:12-709348 DEBUG Register paths
23:40:12-746978 DEBUG Installed packages: 214
23:40:12-747678 DEBUG Extensions all: ['stable-diffusion-webui-rembg',
'sd-extension-system-info', 'sd-extension-chainner',
'Lora', 'stable-diffusion-webui-images-browser',
'sd-webui-agent-scheduler', 'sd-webui-controlnet']
23:40:12-748599 DEBUG Running extension installer:
/home/sagar/automatic/extensions-builtin/stable-diffusi on-webui-rembg/install.py
23:40:12-914853 DEBUG Running extension installer:
/home/sagar/automatic/extensions-builtin/sd-extension-s ystem-info/install.py
23:40:13-148586 DEBUG Running extension installer:
/home/sagar/automatic/extensions-builtin/stable-diffusi on-webui-images-browser/install.py
23:40:13-316049 DEBUG Running extension installer:
/home/sagar/automatic/extensions-builtin/sd-webui-agent -scheduler/install.py
23:40:13-483715 DEBUG Running extension installer:
/home/sagar/automatic/extensions-builtin/sd-webui-contr olnet/install.py
23:40:13-657081 DEBUG Extensions all: []
23:40:13-657917 INFO Extensions enabled: ['stable-diffusion-webui-rembg',
'sd-extension-system-info', 'sd-extension-chainner',
'Lora', 'stable-diffusion-webui-images-browser',
'sd-webui-agent-scheduler', 'sd-webui-controlnet']
23:40:13-658814 INFO Verifying requirements
23:40:13-673432 DEBUG Setup complete without errors: 1703997614
23:40:13-675928 INFO Extension preload: {'extensions-builtin': 0.0,
'extensions': 0.0}
23:40:13-676952 DEBUG Starting module: <module 'webui' from
'/home/sagar/automatic/webui.py'>
23:40:13-677886 INFO Command line args: ['--debug'] debug=True
23:40:13-678638 DEBUG Env flags: []
23:40:16-595238 INFO Load packages: torch=2.3.0.dev20231230+rocm5.7
diffusers=0.25.0 gradio=3.43.2
Segmentation fault (core dumped)

Version Platform Description

No response

Relevant log output

No response

Backend

Original

Branch

Master

Model

SD 1.5

Acknowledgements

GoDJr commented 9 months ago

this is what i see in syslog if im interpreting tihs right: is it a problem with libhsa-runtime64.so?

Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.069525] [drm] PCIE GART of 512M enabled (table at 0x00000083FEB00000). Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.069549] [drm] PSP is resuming... Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.140231] [drm] reserve 0xa00000 from 0x83fd000000 for PSP TMR Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.280452] amdgpu 0000:13:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.280457] amdgpu 0000:13:00.0: amdgpu: SMU is resuming... Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.280461] amdgpu 0000:13:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5800 (58.88.0) Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.280464] amdgpu 0000:13:00.0: amdgpu: SMU driver if version not matched Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.280480] amdgpu 0000:13:00.0: amdgpu: dpm has been enabled Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.285233] amdgpu 0000:13:00.0: amdgpu: SMU is resumed successfully! Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.286732] [drm] DMUB hardware initialized: version=0x0202001F Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.309653] [drm] kiq ring mec 2 pipe 1 q 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.316779] [drm] VCN decode and encode initialized successfully(under DPG Mode). Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317064] [drm] JPEG decode initialized successfully. Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317087] amdgpu 0000:13:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317089] amdgpu 0000:13:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317090] amdgpu 0000:13:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317091] amdgpu 0000:13:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317092] amdgpu 0000:13:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317093] amdgpu 0000:13:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317094] amdgpu 0000:13:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317095] amdgpu 0000:13:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317096] amdgpu 0000:13:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317097] amdgpu 0000:13:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317098] amdgpu 0000:13:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317099] amdgpu 0000:13:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317100] amdgpu 0000:13:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317101] amdgpu 0000:13:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317102] amdgpu 0000:13:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317103] amdgpu 0000:13:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317105] amdgpu 0000:13:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317106] amdgpu 0000:13:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317107] amdgpu 0000:13:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317108] amdgpu 0000:13:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.317109] amdgpu 0000:13:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 8 Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.324479] amdgpu 0000:13:00.0: [drm] Cannot find any crtc or sizes Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.357829] python3[5805]: segfault at 10 ip 00007f091f4d4018 sp 00007ffe965a8420 error 4 in libhsa-runtime64.so[7f091f400000+162000] likely on CPU 0 (core 8, socket 0) Dec 31 16:44:57 sagar-MacPro kernel: [ 1210.357839] Code: 45 31 e4 e9 ca fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 c7 44 24 0c 04 00 00 00 45 31 ff 45 31 f6 e9 ad fe ff ff 0f 1f 44 00 00 <49> 8b 45 10 49 8d 4d 08 48 85 c0 0f 84 87 00 00 00 48 89 ca eb 0e

vladmandic commented 9 months ago

this is what i see in syslog if im interpreting tihs right: is it a problem with libhsa-runtime64.so?

yes, it crashes inside that lib.

one possibility: rocm is notoriously bad detecting capabilities of its own amd gpus,
it primarily relies on env variable HSA_OVERRIDE_GFX_VERSION

and if it attempts to access gpu items that are not compatible, it will not say that, it will just crash. you'll have to google to see what value to use for that env variable. and that's especially tricky on your system since you have two distinct amd gpus, but from different generations.

i have no way of reproducing this or help anymore than this.

vladmandic commented 9 months ago

closing as platform specific issue.