vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.36k stars 382 forks source link

[Issue]: AMD GPU (rx7600) on a ubuntu 20.04.5 with pyton 3.10.12 not used #1973

Closed ConfusedMerlin closed 11 months ago

ConfusedMerlin commented 11 months ago

Issue Description

Tried to install vladmandic's automatic on an Ubuntu yesterday to see, if the ROCm backend performs better than automatics1111's openML on windows.

It... kind of worked. After a lot of problems with the Python 3.8/3.10 versions, it finally started. I immediately issued a 512x512 test image (happy cat sitting on a computer), but was a bit disappointed when it claimed to need 4 Minutes to do it. The image appeared after said time.

Which is 7 times the windows openML counterpart needed. but then the CPU fan gave away, that not the GPU was thinking, but the CPU. The system monitor agreed with that observation, when it showed pretty graphs for all my cpu cores above 50%. This was astounding and concerning at the same time.

Astounding, because the openML automatic1111 version estimated 40m+ for that test image with CPU backend and clogged up my CPU with next to 100% for each core; your version had each core around 60% with a lot of fluctuations. Concerning, because I realized that the GPU was idle the whole time. Looking at the systeminfo page (thanks for including that!) I realized that the backend in use was called CPU.

I looked around the interwebs a bit; somebody here posted a similar issue some time ago (https://github.com/vladmandic/automatic/issues/816), but failed to offer the required log files. But there were some instructions inside this ticket, like "remove venv, delete setup.log". Which I did.

While I had a hickup at one try, where it failed to find the CLIP thingy (this didn't happen the next time), this does not resolve the issue. Also, there is no setup.log, as far as I can remember.

Still, the output during the startup sounds kind of promising, as it says "rocm roolkit detected" and stuff like that. But even with the --use-rocm switch, it falls back to GPU without a highly visible error message around.

As far as I can tell, the GPU should be ready to use; its kernel moduls are compiled and activated. But this being the first time I try to get an AMD GPU to run on Linux, I may draw wrong conclusions about this. But if you google "check if AMD GPU works on ubuntu", all answers are about "doing lspci" and stuff, which did after the drivers claimed to be installed. But I guess if you have a dedicated "try to to check if it works" test at hand, I will do that one too.

Finally... I am sorry, but I cannot offer logs right now. The test system being a new one, I managed to forget my gitlab pw yesterday evening, until gitlab locked the ip... Now I am at work, where I cannot access the test system (but the pw manager knows my password) I will add it to this ticket later this day.

Version Platform Description

ubuntu 20.04.5 (tried a 22.04 first, but the gpu driver installation failed... very hard; not your problem) python 3.10.12 (from that inofficial repo, with fitting pip, also keeping the 3.8 as alternative for ubuntu) radeon rx 7600, 23.10.3 for Ubuntu 20.04.5 HWE (see https://www.amd.com/en/support/linux-drivers) the firefiox that comes with ubuntu 20.04.5 (dunno which version that is)

the vladmandic is cloned fresh (yesterday evening), and the webui.sh seemed to have no problems getting its stuff.

Relevant log output

Create and activate python venv
Launching launch.py...
16:13:18-400774 INFO     Starting SD.Next                                       
16:13:18-404421 INFO     Python 3.10.12 on Linux                                
16:13:18-418094 INFO     Version: 417ef540 Tue Aug 8 12:05:30 2023 -0400        
16:13:18-424700 INFO     AMD ROCm toolkit detected                              
16:13:18-426475 INFO     Installing package: torch==2.0.1 torchvision==0.15.2   
                         --index-url https://download.pytorch.org/whl/rocm5.4.2 
16:14:26-590914 WARNING  Modified files: ['webui.sh']                           
16:14:26-596262 INFO     Verifying requirements                                 
16:14:26-598947 INFO     Installing package: addict                             
16:14:27-292499 INFO     Installing package: aenum                              
16:14:28-045902 INFO     Installing package: aiohttp                            
16:14:29-940152 INFO     Installing package: anyio                              
16:14:30-755362 INFO     Installing package: appdirs                            
16:14:31-403205 INFO     Installing package: astunparse                         
16:14:32-236579 INFO     Installing package: bitsandbytes                       
16:14:34-554593 INFO     Installing package: blendmodes                         
16:14:35-334493 INFO     Installing package: clean-fid                          
16:14:39-099783 INFO     Installing package: easydev                            
16:14:40-158669 INFO     Installing package: extcolors                          
16:14:41-108496 INFO     Installing package: facexlib                           
16:14:48-915052 INFO     Installing package: filetype                           
16:14:49-700933 INFO     Installing package: future                             
16:14:50-668659 INFO     Installing package: gdown                              
16:14:51-689222 INFO     Installing package: gfpgan                             
16:14:58-705860 INFO     Installing package: GitPython                          
16:14:59-907192 INFO     Installing package: httpcore                           
16:15:00-975424 INFO     Installing package: inflection                         
16:15:01-899175 INFO     Installing package: jsonmerge                          
16:15:03-331173 INFO     Installing package: kornia                             
16:15:04-671572 INFO     Installing package: lark                               
16:15:05-712136 INFO     Installing package: lmdb                               
16:15:06-702318 INFO     Installing package: lpips                              
16:15:07-729868 INFO     Installing package: omegaconf                          
16:15:08-959674 INFO     Installing package: open-clip-torch                    
16:15:12-999176 INFO     Installing package: opencv-contrib-python-headless     
16:15:24-136080 INFO     Installing package: piexif                             
16:15:25-296931 INFO     Installing package: psutil                             
16:15:26-620939 INFO     Installing package: pyyaml                             
16:15:27-757667 INFO     Installing package: realesrgan                         
16:15:29-131372 INFO     Installing package: resize-right                       
16:15:30-244588 INFO     Installing package: rich                               
16:15:31-414002 INFO     Installing package: safetensors                        
16:15:32-574226 INFO     Installing package: scipy                              
16:15:33-819556 INFO     Installing package: tb_nightly                         
16:15:35-448721 INFO     Installing package: toml                               
16:15:36-608949 INFO     Installing package: torchdiffeq                        
16:15:37-776375 INFO     Installing package: torchsde                           
16:15:39-103227 INFO     Installing package: voluptuous                         
16:15:40-262033 INFO     Installing package: yapf                               
16:15:41-396815 INFO     Installing package: scikit-image                       
16:15:42-603857 INFO     Installing package: basicsr                            
16:15:43-844599 INFO     Installing package: compel                             
16:15:49-766072 INFO     Installing package: fasteners                          
16:15:51-318559 INFO     Installing package: typing-extensions==4.7.1           
16:15:53-027547 INFO     Installing package: antlr4-python3-runtime==4.9.3      
16:15:54-537777 INFO     Installing package: requests==2.31.0                   
16:15:56-177455 INFO     Installing package: tqdm==4.65.0                       
16:15:57-818081 INFO     Installing package: accelerate==0.20.3                 
16:15:59-542304 INFO     Installing package: opencv-python-headless==4.7.0.72   
16:16:02-098483 INFO     Installing package: diffusers==0.19.3                  
16:16:03-675325 INFO     Installing package: einops==0.4.1                      
16:16:05-258442 INFO     Installing package: gradio==3.32.0                     
16:16:14-776778 INFO     Installing package: huggingface_hub==0.16.4            
16:16:16-533080 INFO     Installing package: numexpr==2.8.4                     
16:16:18-375387 INFO     Installing package: numpy==1.23.5                      
16:16:21-994259 INFO     Installing package: numba==0.57.0                      
16:16:25-601736 INFO     Installing package: pandas==1.5.3                      
16:16:30-781894 INFO     Installing package: protobuf==3.20.3                   
16:16:32-690203 INFO     Installing package: pytorch_lightning==1.9.4           
16:16:35-452447 INFO     Installing package: transformers==4.31.0               
16:16:37-357468 INFO     Installing package: tomesd==0.1.3                      
16:16:39-344207 INFO     Installing package: urllib3==1.26.15                   
16:16:41-289599 INFO     Installing package: Pillow==9.5.0                      
16:16:43-689820 INFO     Installing package: timm==0.6.13                       
16:16:45-948046 INFO     Installing package: pydantic==1.10.11                  
16:16:48-125905 INFO     Verifying packages                                     
16:16:48-127152 INFO     Installing package:                                    
                         git+https://github.com/openai/CLIP.git                 
16:16:52-070316 INFO     Installing package:                                    
                         git+https://github.com/patrickvonplaten/invisible-water
                         mark.git@remove_onnxruntime_depedency                  
16:16:58-555656 INFO     Installing package: onnxruntime==1.15.1                
16:17:01-026876 INFO     Installing package: pi-heif                            
16:17:03-407166 INFO     Installing package: tensorflow-rocm                    
16:17:22-051364 INFO     Verifying repositories                                 
16:17:23-533092 INFO     Verifying submodules                                   
16:17:29-217792 INFO     Extension installed packages:                          
                         stable-diffusion-webui-rembg ['rembg==2.0.38',         
                         'pooch==1.7.0', 'PyMatting==1.1.8']                    
16:17:31-405299 INFO     Extension installed packages:                          
                         stable-diffusion-webui-images-browser                  
                         ['Send2Trash==1.8.2']                                  
16:17:42-802950 INFO     Extension installed packages: sd-webui-controlnet      
                         ['lxml==4.9.3', 'opencv-contrib-python==4.8.0.76',     
                         'reportlab==4.0.4', 'pycparser==2.21',                 
                         'portalocker==2.7.0', 'cffi==1.15.1', 'svglib==1.5.1', 
                         'tinycss2==1.2.1', 'mediapipe==0.10.3',                
                         'tabulate==0.9.0', 'cssselect2==0.7.0',                
                         'webencodings==0.5.1', 'sounddevice==0.4.6',           
                         'iopath==0.1.9', 'yacs==0.1.8',                        
                         'fvcore==0.1.5.post20221221']                          
16:17:46-451774 INFO     Extension installed packages: sd-webui-agent-scheduler 
                         ['SQLAlchemy==2.0.19', 'greenlet==2.0.2']              
16:17:49-072610 INFO     Extension installed packages: clip-interrogator-ext    
                         ['clip-interrogator==0.6.0']                           
16:17:49-149886 INFO     Extensions enabled:                                    
                         ['multidiffusion-upscaler-for-automatic1111',          
                         'stable-diffusion-webui-rembg', 'LDSR', 'Lora',        
                         'stable-diffusion-webui-images-browser',               
                         'sd-webui-controlnet', 'ScuNET',                       
                         'sd-webui-agent-scheduler', 'sd-extension-system-info',
                         'sd-dynamic-thresholding', 'clip-interrogator-ext',    
                         'SwinIR', 'a1111-sd-webui-lycoris']                    
16:17:49-151285 INFO     Verifying packages                                     
16:17:49-152917 INFO     Installing package: tensorflow-rocm                    
16:17:51-272046 INFO     Extension preload: 0.0s                                
                         /opt/ai/automatic/extensions-builtin                   
16:17:51-273821 INFO     Extension preload: 0.0s /opt/ai/automatic/extensions   
16:17:51-290690 INFO     Server arguments: []                                   
No module 'xformers'. Proceeding without it.
16:17:56-493034 INFO     Pipeline: Backend.ORIGINAL                             
16:17:56-806918 INFO     Libraries loaded                                       
16:17:56-808190 INFO     Using data path: /opt/ai/automatic                     
16:17:56-809781 INFO     Available VAEs: /opt/ai/automatic/models/VAE 0         
16:17:56-812199 INFO     Available models:                                      
                         /opt/ai/automatic/models/Stable-diffusion 1            
16:17:58-541250 INFO     ControlNet v1.1.234                                    
ControlNet v1.1.234
ControlNet preprocessor location: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/annotator/downloads
16:17:58-736080 INFO     ControlNet v1.1.234                                    
ControlNet v1.1.234
16:18:15-376578 INFO     Loading UI theme: name=black-orange style=Auto         
Running on local URL:  http://127.0.0.1:7860
16:18:16-446889 INFO     Local URL: http://127.0.0.1:7860/                      
16:18:16-448551 INFO     Initializing middleware                                
16:18:17-020224 INFO     [AgentScheduler] Task queue is empty                   
16:18:17-021243 INFO     [AgentScheduler] Registering APIs                      
Loading weights: /opt/ai/automatic/models/Stable-diffusion/v1-5-pruned-emaonly… 
16:18:18-814744 INFO     Torch override dtype: no-half set                      
16:18:18-815960 INFO     Torch override VAE dtype: no-half set                  
16:18:18-816998 INFO     Setting Torch parameters: dtype=torch.float32          
                         vae=torch.float32 unet=torch.float32                   
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
16:18:23-435481 INFO     Applying Doggettx cross attention optimization         
16:18:23-441643 INFO     Embeddings: loaded=0 skipped=0                         
16:18:23-447903 INFO     Model loaded in 5.9s (load=1.3s create=0.5s apply=4.1s)
16:18:23-846156 INFO     Model load finished: {'ram': {'used': 9.36, 'total':   
                         31.24}} cached=0                                       
16:18:24-294232 INFO     Startup time: 33.0s (torch=3.9s gradio=0.5s            
                         libraries=1.1s scripts=18.4s onchange=0.1s             
                         ui-txt2img=0.1s ui-img2img=0.1s ui-settings=0.1s       
                         ui-extensions=0.4s ui-defaults=0.1s launch=0.2s        
                         app-started=0.7s checkpoint=7.1s) 

the changed webui.sh contains now this line (instead of only python3, which points to python 3.8, which was declared unsuppored somewhere during my first installation tries)

python_cmd="python3.10"
2023-08-09 16:13:18,400 | sd | INFO | launch | Starting SD.Next
2023-08-09 16:13:18,404 | sd | INFO | installer | Python 3.10.12 on Linux
2023-08-09 16:13:18,418 | sd | INFO | installer | Version: 417ef540 Tue Aug 8 12:05:30 2023 -0400
2023-08-09 16:13:18,422 | sd | DEBUG | installer | Setting environment tuning
2023-08-09 16:13:18,423 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=False ipex=False diml=False
2023-08-09 16:13:18,423 | sd | DEBUG | installer | Torch allowed: cuda=True rocm=True ipex=True diml=True
2023-08-09 16:13:18,424 | sd | INFO | installer | AMD ROCm toolkit detected
2023-08-09 16:13:18,426 | sd | DEBUG | installer | Package version not found: torch
2023-08-09 16:13:18,426 | sd | DEBUG | installer | Package version not found: torchvision
2023-08-09 16:13:18,426 | sd | INFO | installer | Installing package: torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2
2023-08-09 16:13:18,427 | sd | DEBUG | installer | Running pip: install --upgrade torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2
2023-08-09 16:14:26,590 | sd | WARNING | installer | Modified files: ['webui.sh']
2023-08-09 16:14:26,596 | sd | DEBUG | installer | Repository update time: Tue Aug  8 18:05:30 2023
2023-08-09 16:14:26,596 | sd | INFO | installer | Verifying requirements
2023-08-09 16:14:26,598 | sd | DEBUG | installer | Package version not found: addict
2023-08-09 16:14:26,598 | sd | INFO | installer | Installing package: addict
2023-08-09 16:14:26,600 | sd | DEBUG | installer | Running pip: install --upgrade addict
2023-08-09 16:14:27,292 | sd | DEBUG | installer | Package version not found: aenum
2023-08-09 16:14:27,292 | sd | INFO | installer | Installing package: aenum
2023-08-09 16:14:27,294 | sd | DEBUG | installer | Running pip: install --upgrade aenum
2023-08-09 16:14:28,045 | sd | DEBUG | installer | Package version not found: aiohttp
2023-08-09 16:14:28,045 | sd | INFO | installer | Installing package: aiohttp
2023-08-09 16:14:28,047 | sd | DEBUG | installer | Running pip: install --upgrade aiohttp
2023-08-09 16:14:29,939 | sd | DEBUG | installer | Package version not found: anyio
2023-08-09 16:14:29,940 | sd | INFO | installer | Installing package: anyio
2023-08-09 16:14:29,941 | sd | DEBUG | installer | Running pip: install --upgrade anyio
2023-08-09 16:14:30,755 | sd | DEBUG | installer | Package version not found: appdirs
2023-08-09 16:14:30,755 | sd | INFO | installer | Installing package: appdirs
2023-08-09 16:14:30,757 | sd | DEBUG | installer | Running pip: install --upgrade appdirs
2023-08-09 16:14:31,402 | sd | DEBUG | installer | Package version not found: astunparse
2023-08-09 16:14:31,403 | sd | INFO | installer | Installing package: astunparse
2023-08-09 16:14:31,404 | sd | DEBUG | installer | Running pip: install --upgrade astunparse
2023-08-09 16:14:32,236 | sd | DEBUG | installer | Package version not found: bitsandbytes
2023-08-09 16:14:32,236 | sd | INFO | installer | Installing package: bitsandbytes
2023-08-09 16:14:32,238 | sd | DEBUG | installer | Running pip: install --upgrade bitsandbytes
2023-08-09 16:14:34,554 | sd | DEBUG | installer | Package version not found: blendmodes
2023-08-09 16:14:34,554 | sd | INFO | installer | Installing package: blendmodes
2023-08-09 16:14:34,556 | sd | DEBUG | installer | Running pip: install --upgrade blendmodes
2023-08-09 16:14:35,334 | sd | DEBUG | installer | Package version not found: clean-fid
2023-08-09 16:14:35,334 | sd | INFO | installer | Installing package: clean-fid
2023-08-09 16:14:35,336 | sd | DEBUG | installer | Running pip: install --upgrade clean-fid
2023-08-09 16:14:39,099 | sd | DEBUG | installer | Package version not found: easydev
2023-08-09 16:14:39,099 | sd | INFO | installer | Installing package: easydev
2023-08-09 16:14:39,101 | sd | DEBUG | installer | Running pip: install --upgrade easydev
2023-08-09 16:14:40,158 | sd | DEBUG | installer | Package version not found: extcolors
2023-08-09 16:14:40,158 | sd | INFO | installer | Installing package: extcolors
2023-08-09 16:14:40,160 | sd | DEBUG | installer | Running pip: install --upgrade extcolors
2023-08-09 16:14:41,108 | sd | DEBUG | installer | Package version not found: facexlib
2023-08-09 16:14:41,108 | sd | INFO | installer | Installing package: facexlib
2023-08-09 16:14:41,110 | sd | DEBUG | installer | Running pip: install --upgrade facexlib
2023-08-09 16:14:48,914 | sd | DEBUG | installer | Package version not found: filetype
2023-08-09 16:14:48,915 | sd | INFO | installer | Installing package: filetype
2023-08-09 16:14:48,916 | sd | DEBUG | installer | Running pip: install --upgrade filetype
2023-08-09 16:14:49,700 | sd | DEBUG | installer | Package version not found: future
2023-08-09 16:14:49,700 | sd | INFO | installer | Installing package: future
2023-08-09 16:14:49,702 | sd | DEBUG | installer | Running pip: install --upgrade future
2023-08-09 16:14:50,668 | sd | DEBUG | installer | Package version not found: gdown
2023-08-09 16:14:50,668 | sd | INFO | installer | Installing package: gdown
2023-08-09 16:14:50,670 | sd | DEBUG | installer | Running pip: install --upgrade gdown
2023-08-09 16:14:51,689 | sd | DEBUG | installer | Package version not found: gfpgan
2023-08-09 16:14:51,689 | sd | INFO | installer | Installing package: gfpgan
2023-08-09 16:14:51,690 | sd | DEBUG | installer | Running pip: install --upgrade gfpgan
2023-08-09 16:14:58,705 | sd | DEBUG | installer | Package version not found: GitPython
2023-08-09 16:14:58,705 | sd | INFO | installer | Installing package: GitPython
2023-08-09 16:14:58,707 | sd | DEBUG | installer | Running pip: install --upgrade GitPython
2023-08-09 16:14:59,906 | sd | DEBUG | installer | Package version not found: httpcore
2023-08-09 16:14:59,907 | sd | INFO | installer | Installing package: httpcore
2023-08-09 16:14:59,908 | sd | DEBUG | installer | Running pip: install --upgrade httpcore
2023-08-09 16:15:00,975 | sd | DEBUG | installer | Package version not found: inflection
2023-08-09 16:15:00,975 | sd | INFO | installer | Installing package: inflection
2023-08-09 16:15:00,977 | sd | DEBUG | installer | Running pip: install --upgrade inflection
2023-08-09 16:15:01,898 | sd | DEBUG | installer | Package version not found: jsonmerge
2023-08-09 16:15:01,899 | sd | INFO | installer | Installing package: jsonmerge
2023-08-09 16:15:01,900 | sd | DEBUG | installer | Running pip: install --upgrade jsonmerge
2023-08-09 16:15:03,330 | sd | DEBUG | installer | Package version not found: kornia
2023-08-09 16:15:03,331 | sd | INFO | installer | Installing package: kornia
2023-08-09 16:15:03,332 | sd | DEBUG | installer | Running pip: install --upgrade kornia
2023-08-09 16:15:04,671 | sd | DEBUG | installer | Package version not found: lark
2023-08-09 16:15:04,671 | sd | INFO | installer | Installing package: lark
2023-08-09 16:15:04,673 | sd | DEBUG | installer | Running pip: install --upgrade lark
2023-08-09 16:15:05,711 | sd | DEBUG | installer | Package version not found: lmdb
2023-08-09 16:15:05,712 | sd | INFO | installer | Installing package: lmdb
2023-08-09 16:15:05,713 | sd | DEBUG | installer | Running pip: install --upgrade lmdb
2023-08-09 16:15:06,702 | sd | DEBUG | installer | Package version not found: lpips
2023-08-09 16:15:06,702 | sd | INFO | installer | Installing package: lpips
2023-08-09 16:15:06,703 | sd | DEBUG | installer | Running pip: install --upgrade lpips
2023-08-09 16:15:07,729 | sd | DEBUG | installer | Package version not found: omegaconf
2023-08-09 16:15:07,729 | sd | INFO | installer | Installing package: omegaconf
2023-08-09 16:15:07,731 | sd | DEBUG | installer | Running pip: install --upgrade omegaconf
2023-08-09 16:15:08,959 | sd | DEBUG | installer | Package version not found: open-clip-torch
2023-08-09 16:15:08,959 | sd | INFO | installer | Installing package: open-clip-torch
2023-08-09 16:15:08,961 | sd | DEBUG | installer | Running pip: install --upgrade open-clip-torch
2023-08-09 16:15:12,998 | sd | DEBUG | installer | Package version not found: opencv-contrib-python-headless
2023-08-09 16:15:12,999 | sd | INFO | installer | Installing package: opencv-contrib-python-headless
2023-08-09 16:15:13,000 | sd | DEBUG | installer | Running pip: install --upgrade opencv-contrib-python-headless
2023-08-09 16:15:24,135 | sd | DEBUG | installer | Package version not found: piexif
2023-08-09 16:15:24,136 | sd | INFO | installer | Installing package: piexif
2023-08-09 16:15:24,137 | sd | DEBUG | installer | Running pip: install --upgrade piexif
2023-08-09 16:15:25,296 | sd | DEBUG | installer | Package version not found: psutil
2023-08-09 16:15:25,296 | sd | INFO | installer | Installing package: psutil
2023-08-09 16:15:25,298 | sd | DEBUG | installer | Running pip: install --upgrade psutil
2023-08-09 16:15:26,620 | sd | DEBUG | installer | Package version not found: pyyaml
2023-08-09 16:15:26,620 | sd | INFO | installer | Installing package: pyyaml
2023-08-09 16:15:26,622 | sd | DEBUG | installer | Running pip: install --upgrade pyyaml
2023-08-09 16:15:27,757 | sd | DEBUG | installer | Package version not found: realesrgan
2023-08-09 16:15:27,757 | sd | INFO | installer | Installing package: realesrgan
2023-08-09 16:15:27,759 | sd | DEBUG | installer | Running pip: install --upgrade realesrgan
2023-08-09 16:15:29,131 | sd | DEBUG | installer | Package version not found: resize-right
2023-08-09 16:15:29,131 | sd | INFO | installer | Installing package: resize-right
2023-08-09 16:15:29,132 | sd | DEBUG | installer | Running pip: install --upgrade resize-right
2023-08-09 16:15:30,244 | sd | DEBUG | installer | Package version not found: rich
2023-08-09 16:15:30,244 | sd | INFO | installer | Installing package: rich
2023-08-09 16:15:30,246 | sd | DEBUG | installer | Running pip: install --upgrade rich
2023-08-09 16:15:31,413 | sd | DEBUG | installer | Package version not found: safetensors
2023-08-09 16:15:31,414 | sd | INFO | installer | Installing package: safetensors
2023-08-09 16:15:31,415 | sd | DEBUG | installer | Running pip: install --upgrade safetensors
2023-08-09 16:15:32,573 | sd | DEBUG | installer | Package version not found: scipy
2023-08-09 16:15:32,574 | sd | INFO | installer | Installing package: scipy
2023-08-09 16:15:32,575 | sd | DEBUG | installer | Running pip: install --upgrade scipy
2023-08-09 16:15:33,819 | sd | DEBUG | installer | Package version not found: tb_nightly
2023-08-09 16:15:33,819 | sd | INFO | installer | Installing package: tb_nightly
2023-08-09 16:15:33,821 | sd | DEBUG | installer | Running pip: install --upgrade tb_nightly
2023-08-09 16:15:35,448 | sd | DEBUG | installer | Package version not found: toml
2023-08-09 16:15:35,448 | sd | INFO | installer | Installing package: toml
2023-08-09 16:15:35,450 | sd | DEBUG | installer | Running pip: install --upgrade toml
2023-08-09 16:15:36,608 | sd | DEBUG | installer | Package version not found: torchdiffeq
2023-08-09 16:15:36,608 | sd | INFO | installer | Installing package: torchdiffeq
2023-08-09 16:15:36,609 | sd | DEBUG | installer | Running pip: install --upgrade torchdiffeq
2023-08-09 16:15:37,776 | sd | DEBUG | installer | Package version not found: torchsde
2023-08-09 16:15:37,776 | sd | INFO | installer | Installing package: torchsde
2023-08-09 16:15:37,777 | sd | DEBUG | installer | Running pip: install --upgrade torchsde
2023-08-09 16:15:39,103 | sd | DEBUG | installer | Package version not found: voluptuous
2023-08-09 16:15:39,103 | sd | INFO | installer | Installing package: voluptuous
2023-08-09 16:15:39,104 | sd | DEBUG | installer | Running pip: install --upgrade voluptuous
2023-08-09 16:15:40,261 | sd | DEBUG | installer | Package version not found: yapf
2023-08-09 16:15:40,262 | sd | INFO | installer | Installing package: yapf
2023-08-09 16:15:40,263 | sd | DEBUG | installer | Running pip: install --upgrade yapf
2023-08-09 16:15:41,396 | sd | DEBUG | installer | Package version not found: scikit-image
2023-08-09 16:15:41,396 | sd | INFO | installer | Installing package: scikit-image
2023-08-09 16:15:41,398 | sd | DEBUG | installer | Running pip: install --upgrade scikit-image
2023-08-09 16:15:42,603 | sd | DEBUG | installer | Package version not found: basicsr
2023-08-09 16:15:42,603 | sd | INFO | installer | Installing package: basicsr
2023-08-09 16:15:42,605 | sd | DEBUG | installer | Running pip: install --upgrade basicsr
2023-08-09 16:15:43,844 | sd | DEBUG | installer | Package version not found: compel
2023-08-09 16:15:43,844 | sd | INFO | installer | Installing package: compel
2023-08-09 16:15:43,846 | sd | DEBUG | installer | Running pip: install --upgrade compel
2023-08-09 16:15:49,765 | sd | DEBUG | installer | Package version not found: fasteners
2023-08-09 16:15:49,766 | sd | INFO | installer | Installing package: fasteners
2023-08-09 16:15:49,767 | sd | DEBUG | installer | Running pip: install --upgrade fasteners
2023-08-09 16:15:51,318 | sd | DEBUG | installer | Package version not found: typing-extensions
2023-08-09 16:15:51,318 | sd | INFO | installer | Installing package: typing-extensions==4.7.1
2023-08-09 16:15:51,320 | sd | DEBUG | installer | Running pip: install --upgrade typing-extensions==4.7.1
2023-08-09 16:15:53,027 | sd | DEBUG | installer | Package version not found: antlr4-python3-runtime
2023-08-09 16:15:53,027 | sd | INFO | installer | Installing package: antlr4-python3-runtime==4.9.3
2023-08-09 16:15:53,029 | sd | DEBUG | installer | Running pip: install --upgrade antlr4-python3-runtime==4.9.3
2023-08-09 16:15:54,537 | sd | DEBUG | installer | Package version not found: requests
2023-08-09 16:15:54,537 | sd | INFO | installer | Installing package: requests==2.31.0
2023-08-09 16:15:54,539 | sd | DEBUG | installer | Running pip: install --upgrade requests==2.31.0
2023-08-09 16:15:56,177 | sd | DEBUG | installer | Package version not found: tqdm
2023-08-09 16:15:56,177 | sd | INFO | installer | Installing package: tqdm==4.65.0
2023-08-09 16:15:56,179 | sd | DEBUG | installer | Running pip: install --upgrade tqdm==4.65.0
2023-08-09 16:15:57,817 | sd | DEBUG | installer | Package version not found: accelerate
2023-08-09 16:15:57,818 | sd | INFO | installer | Installing package: accelerate==0.20.3
2023-08-09 16:15:57,819 | sd | DEBUG | installer | Running pip: install --upgrade accelerate==0.20.3
2023-08-09 16:15:59,542 | sd | DEBUG | installer | Package version not found: opencv-python-headless
2023-08-09 16:15:59,542 | sd | INFO | installer | Installing package: opencv-python-headless==4.7.0.72
2023-08-09 16:15:59,544 | sd | DEBUG | installer | Running pip: install --upgrade opencv-python-headless==4.7.0.72
2023-08-09 16:16:02,098 | sd | DEBUG | installer | Package version not found: diffusers
2023-08-09 16:16:02,098 | sd | INFO | installer | Installing package: diffusers==0.19.3
2023-08-09 16:16:02,101 | sd | DEBUG | installer | Running pip: install --upgrade diffusers==0.19.3
2023-08-09 16:16:03,675 | sd | DEBUG | installer | Package version not found: einops
2023-08-09 16:16:03,675 | sd | INFO | installer | Installing package: einops==0.4.1
2023-08-09 16:16:03,677 | sd | DEBUG | installer | Running pip: install --upgrade einops==0.4.1
2023-08-09 16:16:05,258 | sd | DEBUG | installer | Package version not found: gradio
2023-08-09 16:16:05,258 | sd | INFO | installer | Installing package: gradio==3.32.0
2023-08-09 16:16:05,260 | sd | DEBUG | installer | Running pip: install --upgrade gradio==3.32.0
2023-08-09 16:16:14,776 | sd | DEBUG | installer | Package version not found: huggingface_hub
2023-08-09 16:16:14,776 | sd | INFO | installer | Installing package: huggingface_hub==0.16.4
2023-08-09 16:16:14,778 | sd | DEBUG | installer | Running pip: install --upgrade huggingface_hub==0.16.4
2023-08-09 16:16:16,532 | sd | DEBUG | installer | Package version not found: numexpr
2023-08-09 16:16:16,533 | sd | INFO | installer | Installing package: numexpr==2.8.4
2023-08-09 16:16:16,534 | sd | DEBUG | installer | Running pip: install --upgrade numexpr==2.8.4
2023-08-09 16:16:18,375 | sd | DEBUG | installer | Package version not found: numpy
2023-08-09 16:16:18,375 | sd | INFO | installer | Installing package: numpy==1.23.5
2023-08-09 16:16:18,376 | sd | DEBUG | installer | Running pip: install --upgrade numpy==1.23.5
2023-08-09 16:16:21,993 | sd | DEBUG | installer | Package version not found: numba
2023-08-09 16:16:21,994 | sd | INFO | installer | Installing package: numba==0.57.0
2023-08-09 16:16:21,995 | sd | DEBUG | installer | Running pip: install --upgrade numba==0.57.0
2023-08-09 16:16:25,601 | sd | DEBUG | installer | Package version not found: pandas
2023-08-09 16:16:25,601 | sd | INFO | installer | Installing package: pandas==1.5.3
2023-08-09 16:16:25,603 | sd | DEBUG | installer | Running pip: install --upgrade pandas==1.5.3
2023-08-09 16:16:30,781 | sd | DEBUG | installer | Package version not found: protobuf
2023-08-09 16:16:30,781 | sd | INFO | installer | Installing package: protobuf==3.20.3
2023-08-09 16:16:30,783 | sd | DEBUG | installer | Running pip: install --upgrade protobuf==3.20.3
2023-08-09 16:16:32,689 | sd | DEBUG | installer | Package version not found: pytorch_lightning
2023-08-09 16:16:32,690 | sd | INFO | installer | Installing package: pytorch_lightning==1.9.4
2023-08-09 16:16:32,692 | sd | DEBUG | installer | Running pip: install --upgrade pytorch_lightning==1.9.4
2023-08-09 16:16:35,452 | sd | DEBUG | installer | Package version not found: transformers
2023-08-09 16:16:35,452 | sd | INFO | installer | Installing package: transformers==4.31.0
2023-08-09 16:16:35,454 | sd | DEBUG | installer | Running pip: install --upgrade transformers==4.31.0
2023-08-09 16:16:37,357 | sd | DEBUG | installer | Package version not found: tomesd
2023-08-09 16:16:37,357 | sd | INFO | installer | Installing package: tomesd==0.1.3
2023-08-09 16:16:37,359 | sd | DEBUG | installer | Running pip: install --upgrade tomesd==0.1.3
2023-08-09 16:16:39,344 | sd | DEBUG | installer | Package version not found: urllib3
2023-08-09 16:16:39,344 | sd | INFO | installer | Installing package: urllib3==1.26.15
2023-08-09 16:16:39,345 | sd | DEBUG | installer | Running pip: install --upgrade urllib3==1.26.15
2023-08-09 16:16:41,289 | sd | DEBUG | installer | Package version not found: Pillow
2023-08-09 16:16:41,289 | sd | INFO | installer | Installing package: Pillow==9.5.0
2023-08-09 16:16:41,291 | sd | DEBUG | installer | Running pip: install --upgrade Pillow==9.5.0
2023-08-09 16:16:43,689 | sd | DEBUG | installer | Package version not found: timm
2023-08-09 16:16:43,689 | sd | INFO | installer | Installing package: timm==0.6.13
2023-08-09 16:16:43,690 | sd | DEBUG | installer | Running pip: install --upgrade timm==0.6.13
2023-08-09 16:16:45,947 | sd | DEBUG | installer | Package version not found: pydantic
2023-08-09 16:16:45,948 | sd | INFO | installer | Installing package: pydantic==1.10.11
2023-08-09 16:16:45,949 | sd | DEBUG | installer | Running pip: install --upgrade pydantic==1.10.11
2023-08-09 16:16:48,125 | sd | INFO | installer | Verifying packages
2023-08-09 16:16:48,127 | sd | DEBUG | installer | Package version not found: clip
2023-08-09 16:16:48,127 | sd | INFO | installer | Installing package: git+https://github.com/openai/CLIP.git
2023-08-09 16:16:48,127 | sd | DEBUG | installer | Running pip: install --upgrade git+https://github.com/openai/CLIP.git
2023-08-09 16:16:52,070 | sd | DEBUG | installer | Package version not found: invisible-watermark
2023-08-09 16:16:52,070 | sd | INFO | installer | Installing package: git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency
2023-08-09 16:16:52,072 | sd | DEBUG | installer | Running pip: install --upgrade git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency
2023-08-09 16:16:58,555 | sd | DEBUG | installer | Package version not found: onnxruntime
2023-08-09 16:16:58,555 | sd | INFO | installer | Installing package: onnxruntime==1.15.1
2023-08-09 16:16:58,557 | sd | DEBUG | installer | Running pip: install --upgrade onnxruntime==1.15.1
2023-08-09 16:17:01,026 | sd | DEBUG | installer | Package version not found: pi_heif
2023-08-09 16:17:01,026 | sd | INFO | installer | Installing package: pi-heif
2023-08-09 16:17:01,028 | sd | DEBUG | installer | Running pip: install --upgrade pi-heif
2023-08-09 16:17:03,406 | sd | DEBUG | installer | Package version not found: tensorflow
2023-08-09 16:17:03,407 | sd | INFO | installer | Installing package: tensorflow-rocm
2023-08-09 16:17:03,408 | sd | DEBUG | installer | Running pip: install --upgrade tensorflow-rocm
2023-08-09 16:17:22,051 | sd | INFO | installer | Verifying repositories
2023-08-09 16:17:22,061 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/stable-diffusion-stability-ai / main
2023-08-09 16:17:22,340 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/taming-transformers / master
2023-08-09 16:17:22,659 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/k-diffusion / master
2023-08-09 16:17:23,201 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/BLIP / main
2023-08-09 16:17:23,533 | sd | INFO | installer | Verifying submodules
2023-08-09 16:17:24,063 | sd | DEBUG | installer | Submodule: extensions-builtin/a1111-sd-webui-lycoris / main
2023-08-09 16:17:24,071 | sd | DEBUG | installer | Submodule: extensions-builtin/clip-interrogator-ext / main
2023-08-09 16:17:24,078 | sd | DEBUG | installer | Submodule: extensions-builtin/multidiffusion-upscaler-for-automatic1111 / main
2023-08-09 16:17:24,085 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-dynamic-thresholding / master
2023-08-09 16:17:24,092 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-extension-system-info / main
2023-08-09 16:17:24,099 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-agent-scheduler / main
2023-08-09 16:17:24,117 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-controlnet / main
2023-08-09 16:17:24,129 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-images-browser / main
2023-08-09 16:17:24,135 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
2023-08-09 16:17:24,143 | sd | DEBUG | installer | Submodule: modules/lora / main
2023-08-09 16:17:24,151 | sd | DEBUG | installer | Submodule: modules/lycoris / main
2023-08-09 16:17:24,160 | sd | DEBUG | installer | Submodule: wiki / master
2023-08-09 16:17:24,209 | sd | DEBUG | installer | Installed packages: 197
2023-08-09 16:17:24,209 | sd | DEBUG | installer | Extensions all: ['multidiffusion-upscaler-for-automatic1111', 'stable-diffusion-webui-rembg', 'LDSR', 'Lora', 'stable-diffusion-webui-images-browser', 'sd-webui-controlnet', 'ScuNET', 'sd-webui-agent-scheduler', 'sd-extension-system-info', 'sd-dynamic-thresholding', 'clip-interrogator-ext', 'SwinIR', 'a1111-sd-webui-lycoris']
2023-08-09 16:17:24,247 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
2023-08-09 16:17:29,217 | sd | INFO | installer | Extension installed packages: stable-diffusion-webui-rembg ['rembg==2.0.38', 'pooch==1.7.0', 'PyMatting==1.1.8']
2023-08-09 16:17:29,297 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-images-browser/install.py
2023-08-09 16:17:31,405 | sd | INFO | installer | Extension installed packages: stable-diffusion-webui-images-browser ['Send2Trash==1.8.2']
2023-08-09 16:17:31,406 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/install.py
2023-08-09 16:17:42,802 | sd | INFO | installer | Extension installed packages: sd-webui-controlnet ['lxml==4.9.3', 'opencv-contrib-python==4.8.0.76', 'reportlab==4.0.4', 'pycparser==2.21', 'portalocker==2.7.0', 'cffi==1.15.1', 'svglib==1.5.1', 'tinycss2==1.2.1', 'mediapipe==0.10.3', 'tabulate==0.9.0', 'cssselect2==0.7.0', 'webencodings==0.5.1', 'sounddevice==0.4.6', 'iopath==0.1.9', 'yacs==0.1.8', 'fvcore==0.1.5.post20221221']
2023-08-09 16:17:42,842 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-agent-scheduler/install.py
2023-08-09 16:17:46,451 | sd | INFO | installer | Extension installed packages: sd-webui-agent-scheduler ['SQLAlchemy==2.0.19', 'greenlet==2.0.2']
2023-08-09 16:17:46,452 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-extension-system-info/install.py
2023-08-09 16:17:46,678 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/clip-interrogator-ext/install.py
2023-08-09 16:17:49,072 | sd | INFO | installer | Extension installed packages: clip-interrogator-ext ['clip-interrogator==0.6.0']
2023-08-09 16:17:49,149 | sd | DEBUG | installer | Extensions all: []
2023-08-09 16:17:49,149 | sd | INFO | installer | Extensions enabled: ['multidiffusion-upscaler-for-automatic1111', 'stable-diffusion-webui-rembg', 'LDSR', 'Lora', 'stable-diffusion-webui-images-browser', 'sd-webui-controlnet', 'ScuNET', 'sd-webui-agent-scheduler', 'sd-extension-system-info', 'sd-dynamic-thresholding', 'clip-interrogator-ext', 'SwinIR', 'a1111-sd-webui-lycoris']
2023-08-09 16:17:49,151 | sd | INFO | installer | Verifying packages
2023-08-09 16:17:49,152 | sd | DEBUG | installer | Package version not found: tensorflow
2023-08-09 16:17:49,152 | sd | INFO | installer | Installing package: tensorflow-rocm
2023-08-09 16:17:49,153 | sd | DEBUG | installer | Running pip: install --upgrade tensorflow-rocm
2023-08-09 16:17:51,256 | sd | DEBUG | launch | Setup complete without errors: 1691590671
2023-08-09 16:17:51,272 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions-builtin
2023-08-09 16:17:51,273 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions
2023-08-09 16:17:51,290 | sd | INFO | launch | Server arguments: []
2023-08-09 16:17:56,493 | sd | INFO | shared | Pipeline: Backend.ORIGINAL
2023-08-09 16:17:56,806 | sd | INFO | webui | Libraries loaded
2023-08-09 16:17:56,808 | sd | INFO | webui | Using data path: /opt/ai/automatic
2023-08-09 16:17:56,809 | sd | INFO | sd_vae | Available VAEs: /opt/ai/automatic/models/VAE 0
2023-08-09 16:17:56,812 | sd | INFO | sd_models | Available models: /opt/ai/automatic/models/Stable-diffusion 1
2023-08-09 16:17:58,541 | ControlNet | INFO | controlnet_version | ControlNet v1.1.234
2023-08-09 16:17:58,736 | ControlNet | INFO | controlnet_version | ControlNet v1.1.234
2023-08-09 16:18:15,376 | sd | INFO | shared | Loading UI theme: name=black-orange style=Auto
2023-08-09 16:18:16,446 | sd | INFO | webui | Local URL: http://127.0.0.1:7860/
2023-08-09 16:18:16,448 | sd | INFO | middleware | Initializing middleware
2023-08-09 16:18:17,020 | sd | INFO | task_runner | [AgentScheduler] Task queue is empty
2023-08-09 16:18:17,021 | sd | INFO | api | [AgentScheduler] Registering APIs
2023-08-09 16:18:18,814 | sd | INFO | devices | Torch override dtype: no-half set
2023-08-09 16:18:18,815 | sd | INFO | devices | Torch override VAE dtype: no-half set
2023-08-09 16:18:18,816 | sd | INFO | devices | Setting Torch parameters: dtype=torch.float32 vae=torch.float32 unet=torch.float32
2023-08-09 16:18:23,435 | sd | INFO | sd_hijack | Applying Doggettx cross attention optimization
2023-08-09 16:18:23,441 | sd | INFO | textual_inversion | Embeddings: loaded=0 skipped=0
2023-08-09 16:18:23,447 | sd | INFO | sd_models | Model loaded in 5.9s (load=1.3s create=0.5s apply=4.1s)
2023-08-09 16:18:23,846 | sd | INFO | sd_models | Model load finished: {'ram': {'used': 9.36, 'total': 31.24}} cached=0
2023-08-09 16:18:24,294 | sd | INFO | webui | Startup time: 33.0s (torch=3.9s gradio=0.5s libraries=1.1s scripts=18.4s onchange=0.1s ui-txt2img=0.1s ui-img2img=0.1s ui-settings=0.1s ui-extensions=0.4s ui-defaults=0.1s launch=0.2s app-started=0.7s checkpoint=7.1s)

EDIT: Added log and console outpu

Acknowledgements

vladmandic commented 11 months ago

Also, there is no setup.log, as far as I can remember

because its called sdnext.log - that's noted in issue template

otherwise, i agree, this seems like its using cpu version of torch. i'd need to see the logs why its doing that.

ConfusedMerlin commented 11 months ago

Also, there is no setup.log, as far as I can remember

because its called sdnext.log - that's noted in issue template

otherwise, i agree, this seems like its using cpu version of torch. i'd need to see the logs why its doing that.

@vladmandic , sdnext.log I saw; I will delete it and venv and call webui.sh again then, and post it (and the shell output) in here.... in about 8 hours :)

Until then, do you happen to know a verified way to test if the gpu is actually ready in use? I mean, I can activate kernel modules for stuff I don't even have in some cases, so that should be done too.

vladmandic commented 11 months ago

Until then, do you happen to know a verified way to test if the gpu is actually ready in use?

you should have rocm-smi utility that can show gpu utilization, power draw, etc.

one idea - do you have cpu on-board gpu? it could be that rocm is used, but its using one on your cpu and thats slow. if yes, try using --device-id param

ConfusedMerlin commented 11 months ago

@vladmandic logs added. nope, gpu and cpu are separated entities; I recently replaced an old rx580 (which was fried by KSP 2) with that rx7600 that now is running. Looking back, I should have left the AMD fan corner for nvidia, but I didn't know about rocm/cuda/openml and the likes back then...

anyway. rocm-smi... I do not have. There seems to be a git repo for that... should I try to build it? https://github.com/RadeonOpenCompute/rocm_smi_lib

But there is... rocminfo? that is kind of interesting; it lists the "HSA agent" entries available on my system, of which two seem to be my CPU and one is the GPU:

OCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen Threadripper 2920X 12-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen Threadripper 2920X 12-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16312948(0xf8ea74) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16312948(0xf8ea74) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16312948(0xf8ea74) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    AMD Ryzen Threadripper 2920X 12-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen Threadripper 2920X 12-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16444956(0xfaee1c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16444956(0xfaee1c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16444956(0xfaee1c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx1102                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 7600                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 29824(0x7480)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   17408                              
  Internal Node ID:        2                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1102         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

EDiT: radeontop works; even if its does not know the cards name or chip, it is able to draw gpu and vram usage... now, next check, can it do rocm. lets see, if I can find something to test that.

vladmandic commented 11 months ago

quick look at the log doesn't show any issues with torch - it seems to be installed and initialized correctly and it does detect rocm - i needed to verify that.

anyhow, i've used rocm-smi before, but if radeontop works, its good enough.

this rocminfo output is strange, why is it listing threadripper (two instances) before gpu - that might as well be throwing rocm off so its using first-available and that happens to be cpu, not gpu. you can try

ConfusedMerlin commented 11 months ago

I looked up the 5.6 rocm page: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html it says, my gpu isn't officially supported on linux (but windows). And... it does look kind of official for me?

Also, the page listed some installation hints I didn't try out yet (like "amdgpu-install --usecase=rocm", which installed a lot of new libraries).

now rocm-smi works also:

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK   Fan  Perf  PwrCap  VRAM%  GPU%  
0    33.0c           2.0W    226Mhz  96Mhz  0%   auto  140.0W    6%   1%    
================================================================================
============================= End of ROCm SMI Log ==============================

So... lets try --use-rocm --device-id 2... nope. id 1.... nope. id 0.... nope. I just discovered that you can see if it did work or not in the console output:

Model load finished: {'ram': {'used': 9.29, 'total': 31.24}} cached=0

no consumer graphic card brings 32GB of ram with it; that is my system RAM. it was printed there every time. I guess, if it would take the gpu, it should show... 8GB.

I added the IF section to my webui.sh... and it still does not like the GPU. And of course, it does not like --use-directML (well, that is Windows).

I am a bit confused. Next thing to do... trying out your web ui on windows. Yes, same rig, another hd. over there, at least directml may work.

Until later

vladmandic commented 11 months ago

btw, what does system info tab show?

ConfusedMerlin commented 11 months ago

interestingly, windows behaves mostly the same. If I call --use-rocm, it starts, but uses the cpu. But it is at least so honest to put that in the shell log (using cpu backend). directML dies there too, with mostly the same error as it does here in Linux.

with... --use-rocm, it looks like this

Bildschirmfoto von 2023-08-09 18-13-12

it just feels like the whole thing does not want to use any gpu at all...

vladmandic commented 11 months ago

It shows rocm detected correctly, but it also shows backend as cpu which means at one point fallback happened.

I'll probably need to spray a bit more debug messages to see where and why the fallback happens.

ConfusedMerlin commented 11 months ago

there is a --debug flag for the webui.sh, which I did not use yet (depite it being placed at the main page here):

$ ./webui.sh --use-rocm --debug
Create and activate python venv
Launching launch.py...
19:32:19-978228 INFO     Starting SD.Next                                       
19:32:19-981486 INFO     Python 3.10.12 on Linux                                
19:32:19-986127 INFO     Version: 417ef540 Tue Aug 8 12:05:30 2023 -0400        
19:32:20-266253 DEBUG    Setting environment tuning                             
19:32:20-268263 DEBUG    Torch overrides: cuda=False rocm=True ipex=False       
                         diml=False                                             
19:32:20-270057 DEBUG    Torch allowed: cuda=False rocm=True ipex=False         
                         diml=False                                             
19:32:20-271806 INFO     AMD ROCm toolkit detected                              
19:32:20-287477 WARNING  Modified files: ['webui.sh']                           
19:32:20-293218 DEBUG    Repository update time: Tue Aug  8 18:05:30 2023       
19:32:20-294889 DEBUG    Previous setup time: Wed Aug  9 16:17:51 2023          
19:32:20-296419 INFO     Disabled extensions: []                                
19:32:20-297927 INFO     Enabled extensions-builtin:                            
                         ['multidiffusion-upscaler-for-automatic1111',          
                         'stable-diffusion-webui-rembg', 'LDSR', 'Lora',        
                         'stable-diffusion-webui-images-browser',               
                         'sd-webui-controlnet', 'ScuNET',                       
                         'sd-webui-agent-scheduler', 'sd-extension-system-info',
                         'sd-dynamic-thresholding', 'clip-interrogator-ext',    
                         'SwinIR', 'a1111-sd-webui-lycoris']                    
19:32:20-301770 INFO     Enabled extensions: []                                 
19:32:20-303200 DEBUG    Latest extensions time: Wed Aug  9 16:17:24 2023       
19:32:20-304554 DEBUG    Timestamps: version:1691510730 setup:1691590671        
                         extension:1691590644                                   
19:32:20-305403 INFO     No changes detected: Quick launch active               
19:32:20-306112 INFO     Verifying requirements                                 
19:32:20-320393 INFO     Disabled extensions: []                                
19:32:20-321302 INFO     Enabled extensions-builtin:                            
                         ['multidiffusion-upscaler-for-automatic1111',          
                         'stable-diffusion-webui-rembg', 'LDSR', 'Lora',        
                         'stable-diffusion-webui-images-browser',               
                         'sd-webui-controlnet', 'ScuNET',                       
                         'sd-webui-agent-scheduler', 'sd-extension-system-info',
                         'sd-dynamic-thresholding', 'clip-interrogator-ext',    
                         'SwinIR', 'a1111-sd-webui-lycoris']                    
19:32:20-323184 INFO     Enabled extensions: []                                 
19:32:20-325969 INFO     Extension preload: 0.0s                                
                         /opt/ai/automatic/extensions-builtin                   
19:32:20-327403 INFO     Extension preload: 0.0s /opt/ai/automatic/extensions   
19:32:20-339301 DEBUG    Memory used: 0.04 total: 31.24 Collected 0             
19:32:20-340488 DEBUG    Starting module: <module 'webui' from                  
                         '/opt/ai/automatic/webui.py'>                          
19:32:20-341366 INFO     Server arguments: ['--use-rocm', '--debug']            
19:32:20-365802 DEBUG    Loading Torch                                          
19:32:24-103673 DEBUG    Loading Gradio                                         
19:32:24-602334 DEBUG    Loading Modules                                        
No module 'xformers'. Proceeding without it.
19:32:25-408084 DEBUG    Reading: /opt/ai/automatic/config.json len=295         
19:32:25-409865 INFO     Pipeline: Backend.ORIGINAL                             
19:32:25-410877 DEBUG    Loaded styles: /opt/ai/automatic/styles.csv 0          
19:32:25-730555 INFO     Libraries loaded                                       
19:32:25-731842 DEBUG    Entering start sequence                                
19:32:25-739928 DEBUG    Version: {'app': 'sd.next', 'updated': '2023-08-08',   
                         'hash': '417ef540', 'url':                             
                         'https://github.com/vladmandic/automatic/tree/master'} 
19:32:25-742381 INFO     Using data path: /opt/ai/automatic                     
19:32:25-744220 DEBUG    Event loop: <_UnixSelectorEventLoop running=False      
                         closed=False debug=False>                              
19:32:25-745951 DEBUG    Entering initialize                                    
19:32:25-747101 DEBUG    Available samplers: ['UniPC', 'DDIM', 'PLMS', 'Euler   
                         a', 'Euler', 'DPM++ 2S a', 'DPM++ 2S a Karras', 'DPM++ 
                         2M', 'DPM++ 2M Karras', 'DPM++ SDE', 'DPM++ SDE        
                         Karras', 'DPM++ 2M SDE', 'DPM++ 2M SDE Karras', 'DPM   
                         fast', 'DPM adaptive', 'DPM2', 'DPM2 Karras', 'DPM2 a',
                         'DPM2 a Karras', 'LMS', 'LMS Karras', 'Heun']          
19:32:25-750649 INFO     Available VAEs: /opt/ai/automatic/models/VAE 0         
19:32:25-752687 DEBUG    Reading: /opt/ai/automatic/cache.json len=1            
19:32:25-754158 DEBUG    Reading: /opt/ai/automatic/metadata.json len=1         
19:32:25-755477 INFO     Available models:                                      
                         /opt/ai/automatic/models/Stable-diffusion 1            
19:32:25-782980 DEBUG    Loading scripts                                        
19:32:27-490352 INFO     ControlNet v1.1.234                                    
ControlNet v1.1.234
ControlNet preprocessor location: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/annotator/downloads
19:32:27-684530 INFO     ControlNet v1.1.234                                    
ControlNet v1.1.234
19:32:28-555204 DEBUG    Scripts load: ['a1111-sd-webui-lycoris:0.58s',         
                         'clip-interrogator-ext:0.061s', 'LDSR:0.057s',         
                         'Lora:0.332s', 'sd-dynamic-thresholding:0.056s',       
                         'sd-extension-system-info:0.113s',                     
                         'sd-webui-agent-scheduler:0.372s',                     
                         'sd-webui-controlnet:0.325s',                          
                         'stable-diffusion-webui-images-browser:0.121s',        
                         'stable-diffusion-webui-rembg:0.623s', 'SwinIR:0.061s',
                         'ScuNET:0.062s']                                       
Scripts load: ['a1111-sd-webui-lycoris:0.58s', 'clip-interrogator-ext:0.061s', 'LDSR:0.057s', 'Lora:0.332s', 'sd-dynamic-thresholding:0.056s', 'sd-extension-system-info:0.113s', 'sd-webui-agent-scheduler:0.372s', 'sd-webui-controlnet:0.325s', 'stable-diffusion-webui-images-browser:0.121s', 'stable-diffusion-webui-rembg:0.623s', 'SwinIR:0.061s', 'ScuNET:0.062s']
19:32:28-685202 INFO     Loading UI theme: name=black-orange style=Auto         
19:32:28-688421 DEBUG    Creating UI                                            
19:32:28-692934 DEBUG    Reading: /opt/ai/automatic/ui-config.json len=0        
19:32:28-722115 DEBUG    Extra networks: checkpoints items=1 subdirs=0          
19:32:28-767777 DEBUG    UI interface: tab=txt2img batch=False seed=False       
                         advanced=False second_pass=False                       
19:32:28-877445 DEBUG    UI interface: tab=img2img seed=False resize=False      
                         batch=False denoise=True advanced=False                
19:32:28-957262 DEBUG    Reading: /opt/ai/automatic/ui-config.json len=0        
19:32:29-661055 DEBUG    Script: 0.53s ui_tabs                                  
                         /opt/ai/automatic/extensions-builtin/stable-diffusion-w
                         ebui-images-browser/scripts/image_browser.py           
19:32:29-663900 DEBUG    Extensions list failed to load:                        
                         /opt/ai/automatic/html/extensions.json                 
19:32:29-749482 DEBUG    Extension list refresh: processed=13 installed=13      
                         enabled=13 disabled=0 visible=13 hidden=0              
Running on local URL:  http://127.0.0.1:7860
19:32:30-065888 INFO     Local URL: http://127.0.0.1:7860/                      
19:32:30-067920 DEBUG    Gradio registered functions: 1852                      
19:32:30-069280 INFO     Initializing middleware                                
19:32:30-074981 DEBUG    Creating API                                           
19:32:30-222233 INFO     [AgentScheduler] Task queue is empty                   
19:32:30-223304 INFO     [AgentScheduler] Registering APIs                      
19:32:30-351779 DEBUG    Scripts setup: ['Tiled Diffusion:0.023s',              
                         'ControlNet:0.041s', 'Alternative:0.009s']             
19:32:30-355006 DEBUG    Scripts components: []                                 
19:32:30-355720 DEBUG    Model metadata: /opt/ai/automatic/metadata.json no     
                         changes                                                
19:32:30-362512 DEBUG    Select checkpoint: model                               
                         v1-5-pruned-emaonly.safetensors [6ce0161689]           
19:32:30-365126 DEBUG    Load model weights: existing=False                     
                         target=/opt/ai/automatic/models/Stable-diffusion/v1-5-p
                         runed-emaonly.safetensors info=None                    
19:32:30-684233 DEBUG    gc: collected=10213 device=cpu {'ram': {'used': 1.31,  
                         'total': 31.24}}                                       
Loading weights: /opt/ai/automatic/models/Stable-diffusion/v1-5-pruned-emaonly… 
19:32:30-861365 DEBUG    Load model:                                            
                         name=/opt/ai/automatic/models/Stable-diffusion/v1-5-pru
                         ned-emaonly.safetensors dict=True                      
19:32:30-862395 DEBUG    Verifying Torch settings                               
19:32:30-863051 INFO     Torch override dtype: no-half set                      
19:32:30-863733 INFO     Torch override VAE dtype: no-half set                  
19:32:30-864393 DEBUG    Desired Torch parameters: dtype=FP32 no-half=True      
                         no-half-vae=True upscast=True                          
19:32:30-865284 INFO     Setting Torch parameters: dtype=torch.float32          
                         vae=torch.float32 unet=torch.float32                   
19:32:30-866153 DEBUG    Torch default device: cpu                              
19:32:30-867939 DEBUG    Model dict loaded: {'ram': {'used': 1.34, 'total':     
                         31.24}}                                                
19:32:30-882125 DEBUG    Model config loaded: {'ram': {'used': 1.34, 'total':   
                         31.24}}                                                
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
19:32:31-344983 DEBUG    Model created from config:                             
                         /opt/ai/automatic/configs/v1-inference.yaml            
19:32:31-347451 DEBUG    Model weights loading: {'ram': {'used': 2.31, 'total': 
                         31.24}}                                                
19:32:32-142593 DEBUG    Model weights loaded: {'ram': {'used': 9.28, 'total':  
                         31.24}}                                                
19:32:32-152908 DEBUG    Model weights moved: {'ram': {'used': 9.28, 'total':   
                         31.24}}                                                
19:32:32-161406 INFO     Applying Doggettx cross attention optimization         
19:32:32-167383 INFO     Embeddings: loaded=0 skipped=0                         
19:32:32-174052 INFO     Model loaded in 1.5s (load=0.2s create=0.5s apply=0.8s)
19:32:32-482288 DEBUG    gc: collected=24 device=cpu {'ram': {'used': 9.29,     
                         'total': 31.24}}                                       
19:32:32-484533 INFO     Model load finished: {'ram': {'used': 9.29, 'total':   
                         31.24}} cached=0                                       
19:32:32-946570 DEBUG    gc: collected=0 device=cpu {'ram': {'used': 5.32,      
                         'total': 31.24}}                                       
19:32:32-948283 INFO     Startup time: 12.6s (torch=3.7s gradio=0.5s            
                         libraries=1.1s scripts=2.8s onchange=0.1s              
                         ui-txt2img=0.1s ui-img2img=0.1s ui-settings=0.1s       
                         ui-extensions=0.7s ui-defaults=0.1s launch=0.2s        
                         app-started=0.3s checkpoint=2.6s) 

I cannot see anything error in it; only the 32GB RAM shows up more early, and it openly says "cpu" in some of the gc entries.

vladmandic commented 11 months ago

run a simple test from inside venv:

python -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"

ConfusedMerlin commented 11 months ago

okay, into the cloned repo, into its venv folder.... this went... surprisingly not as expected (neither python nor python3.10):

python3.10 -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'

EDiT: I did pip3 install torch after that. It was quite busy installing a metric ton of additional packages. Aren't they part of the webui.sh installation? Okay, dumb question in light of my first result.

Now it says.... this (not much better)

python3.10 -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
<module 'torch.version' from '/home/rk/.local/lib/python3.10/site-packages/torch/version.py'> False 11.7 None
vladmandic commented 11 months ago

okay, into the cloned repo, into its venv folder....

you don't do cd venv, you activate venv so it becomes an active context. something like venv/scripts/activate (check exact names, not in front of active install now to check)

ConfusedMerlin commented 11 months ago

ah... my bad; I rarely use python venv (well, I let pycharm handle that usually)

python -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
<module 'torch.version' from '/opt/ai/automatic/venv/lib/python3.10/site-packages/torch/version.py'> False None 5.4.22803-474e8620

This is quite confusing... I am kind of sure that your software isn't to blame, but my installation is... well... messed up beyond usability. I am not brave enough to allow a dist-ugrade, because then the graphic card driver will fail to install again.... not that this would make this big of a difference.

Should I go for 22.04 and see what happens?

vladmandic commented 11 months ago

this test clearly shows that torch-rocm is correctly installed and it detects rocm libs, but it doesn't detect actual gpu. which is very weird since your rocminfo does detect gpu.

if it were me, i'd go for ubuntu 22.04. and you might as well install torch with rocm 5.6 instead of default 5.4. and set correct HSA_OVERRIDE_GFX_VERSION (from my first post).

ConfusedMerlin commented 11 months ago

nope... I pip installed rocm 5.6 inside the venv, added that gfx=1102 switch case to the webui.sh and tried again, with the same result.

Well then, up to 22.04 I go. I bet the gpu drivers will fail to install yet again.

ConfusedMerlin commented 11 months ago

And... done. And guess what? The same kind of error as before. All seems fine, until you are in the webui, where it only likes my gpu

ConfusedMerlin commented 11 months ago

hm, this time the amdgpu-install did not fail. I guess that is because I didn't get the 6.x kernel like when I installed 22.04 directly, but still have the 5.19. Another thing I noticed... I get my rocm stuff from here:

https://repo.radeon.com/rocm/apt/5.5.3

webui.sh installs that

https://download.pytorch.org/whl/rocm5.4.2

I wondered if I should adjust that, but then I decided to check out the URL from that indes-url parameter: Installing package: torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2

Turns out, https://download.pytorch.org/whl/rocm5.4.2 points to an access denied xml? Is that supposed to happen, or does the installer.py somehow authenticate there?

EDiT: Looked up the torch+rocm secion at the same page... no +5.5.3 rocm there yet. I smell new problems.... which only would come into play, if my stupid card would get recognized at all.

iDeNoh commented 11 months ago

hm, this time the amdgpu-install did not fail. I guess that is because I didn't get the 6.x kernel like when I installed 22.04 directly, but still have the 5.19. Another thing I noticed... I get my rocm stuff from here:

https://repo.radeon.com/rocm/apt/5.5.3

webui.sh installs that

https://download.pytorch.org/whl/rocm5.4.2

I wondered if I should adjust that, but then I decided to check out the URL from that indes-url parameter: Installing package: torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2

Turns out, https://download.pytorch.org/whl/rocm5.4.2 points to an access denied xml? Is that supposed to happen, or does the installer.py somehow authenticate there?

EDiT: Looked up the torch+rocm secion at the same page... no +5.5.3 rocm there yet. I smell new problems.... which only would come into play, if my stupid card would get recognized at all.

as far as I'm aware you aren't able to access any of the download directories that way in your browser, its intended for install via commandline.

regarding the no +5.5.3 version, that wont matter. if you REALLY want you can install the torch 2.1.0 nightly + rocm 5.6, add export TORCH_COMMAND="torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.6" to your webui-user.sh, also I'd recommend adding export ROCMPATH="/opt/rocm", that sometimes fixes issues like this. and finally, make sure you add an HSA override to it as well as vlad mentioned, mine looks like this export HSA_OVERRIDE_GFX_VERSION=10.3.0, yours will be whatever the correct version is for your card.

my full webui-user.sh:

export COMMANDLINE_ARGS="--listen --use-rocm --insecure --debug" export ROCMPATH="/opt/rocm" export HSA_OVERRIDE_GFX_VERSION=10.3.0 export TORCH_COMMAND="torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.6"

iDeNoh commented 11 months ago

Also, something to note, you can use an earlier version of rocm built torch with a newer system version, eg I have 5.6 installed and use torch+5.4.2 on a few projects.

ConfusedMerlin commented 11 months ago

@iDeNoh I guess I have to create the webui-user.sh by myself, as there is no such file in my cloned project? Then, should it call webui.sh at the end?

iDeNoh commented 11 months ago

@ConfusedMerlin yes, create the webui-user.sh manually and add whatever you're going to pass into it and then launch with webui.sh however you want. I set up a .desktop shortcut for mine to add it to the favorites bar

ConfusedMerlin commented 11 months ago

and... I just bricked my Installation. A python package conflict crept into it over the last hours of trying. Being a bit fed up and demotivated, I used a lot of -y during apt-get operations. And this one time I hit the... anti-jackpot. Lets says, I had 10G more free space afterwards (astounding, how much stuff is depending on pylib3.10....)

Anyway, I will be back tomorrow, with a fresh 22.04 ubuntu. In the meantime.... do we have user of your automatic fork that managed to get it working with ubuntu 2x and a rx 7600?

vladmandic commented 11 months ago

i don't know, i'm big on privacy, so there is no callhome to report usage and i honestly don't report what everyone said over time - too many conversations. better ask in discord, its quite active.

but in your case, this has nothing to do with fork - torch refuses to detect your gpu, that one line python test shows that.
you can nuke entire sdnext and once that oneliner reports true, then install sdnext.

iDeNoh commented 11 months ago

If I had to guess I'd say it's probably a busted rocm install, how did you install? I ask because the easiest method is to use the script installer directly off of amd's website,make sure you follow the instructions precisely and ensure you set up the prerequisites before you install.

ConfusedMerlin commented 11 months ago

Greetings @vladmandic

so, lets get this in an orderly way:

So, why am I getting a true for cuda.is_available now? "video" and "render". Just checked it. Removed my user from these groups, reboot, tried again. False. Added to groups, reboot, tried again. True. rocminfo does hint on that, but its error message can be misleading, as it starts with "permission to dev/something denied", and then of course sudo rocminfo looks cool. But if you add your user to video and render (must be both), then suddenly it can do rocminfo. And it says true!

Now... for the "Speicherzugriffsfehler" (Memory access violation, I would say)... I make another post, so we can keep these apart better (also, I would guess its a new issue)

ConfusedMerlin commented 11 months ago

console:

./webui.sh 
Create and activate python venv
Launching launch.py...
14:06:56-781913 INFO     Starting SD.Next                                       
14:06:56-786013 INFO     Python 3.10.12 on Linux                                
14:06:56-802997 INFO     Version: 69eaf4c6 Sat Aug 12 08:32:19 2023 +0000       
14:06:57-162359 DEBUG    Setting environment tuning                             
14:06:57-164374 DEBUG    Torch overrides: cuda=False rocm=True ipex=False       
                         diml=False                                             
14:06:57-166294 DEBUG    Torch allowed: cuda=False rocm=True ipex=False         
                         diml=False                                             
14:06:57-233467 DEBUG    Repository update time: Sat Aug 12 10:32:19 2023       
14:06:57-234919 INFO     Verifying requirements                                 
14:06:57-248009 INFO     Verifying packages                                     
14:06:57-250049 INFO     Verifying repositories                                 
14:06:57-257607 DEBUG    Submodule:                                             
                         /opt/ai/automatic/repositories/stable-diffusion-stabili
                         ty-ai / main                                           
14:06:57-724807 DEBUG    Submodule:                                             
                         /opt/ai/automatic/repositories/taming-transformers /   
                         master                                                 
14:06:58-159123 DEBUG    Submodule: /opt/ai/automatic/repositories/k-diffusion /
                         master                                                 
14:06:58-997480 DEBUG    Submodule: /opt/ai/automatic/repositories/BLIP / main  
14:06:59-466906 INFO     Verifying submodules                                   
14:06:59-872804 DEBUG    Submodule: extensions-builtin/a1111-sd-webui-lycoris / 
                         main                                                   
14:06:59-885050 DEBUG    Submodule: extensions-builtin/clip-interrogator-ext /  
                         main                                                   
14:06:59-896967 DEBUG    Submodule:                                             
                         extensions-builtin/multidiffusion-upscaler-for-automati
                         c1111 / main                                           
14:06:59-906399 DEBUG    Submodule: extensions-builtin/sd-dynamic-thresholding /
                         master                                                 
14:06:59-914269 DEBUG    Submodule: extensions-builtin/sd-extension-system-info 
                         / main                                                 
14:06:59-921273 DEBUG    Submodule: extensions-builtin/sd-webui-agent-scheduler 
                         / main                                                 
14:06:59-928293 DEBUG    Submodule: extensions-builtin/sd-webui-controlnet /    
                         main                                                   
14:06:59-958614 DEBUG    Submodule:                                             
                         extensions-builtin/stable-diffusion-webui-images-browse
                         r / main                                               
14:06:59-966978 DEBUG    Submodule:                                             
                         extensions-builtin/stable-diffusion-webui-rembg /      
                         master                                                 
14:06:59-974058 DEBUG    Submodule: modules/lora / main                         
14:06:59-981843 DEBUG    Submodule: modules/lycoris / main                      
14:06:59-989506 DEBUG    Submodule: wiki / master                               
14:07:00-071207 DEBUG    Installed packages: 220                                
14:07:00-072289 DEBUG    Extensions all: ['sd-webui-agent-scheduler',           
                         'clip-interrogator-ext',                               
                         'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR',      
                         'sd-webui-controlnet', 'sd-dynamic-thresholding',      
                         'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 
                         'sd-extension-system-info',                            
                         'stable-diffusion-webui-images-browser', 'Lora',       
                         'a1111-sd-webui-lycoris']                              
14:07:00-073594 DEBUG    Running extension installer:                           
                         /opt/ai/automatic/extensions-builtin/sd-webui-agent-sch
                         eduler/install.py                                      
14:07:00-265381 DEBUG    Running extension installer:                           
                         /opt/ai/automatic/extensions-builtin/clip-interrogator-
                         ext/install.py                                         
14:07:08-117198 DEBUG    Running extension installer:                                                
                         /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
14:07:08-392023 DEBUG    Running extension installer:                                                
                         /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/install.py         
14:07:08-714753 DEBUG    Running extension installer:                                                
                         /opt/ai/automatic/extensions-builtin/sd-extension-system-info/install.py    
14:07:08-894843 DEBUG    Running extension installer:                                                
                         /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-images-browser/i
                         nstall.py                                                                   
14:07:09-180858 DEBUG    Extensions all: []                                                          
14:07:09-181893 INFO     Extensions enabled: ['sd-webui-agent-scheduler', 'clip-interrogator-ext',   
                         'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet',    
                         'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111',     
                         'ScuNET', 'sd-extension-system-info',                                       
                         'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']  
14:07:09-183000 INFO     Verifying packages                                                          
14:07:09-184858 DEBUG    Setup complete without errors: 1691842029                                   
14:07:09-190212 INFO     Extension preload: 0.0s /opt/ai/automatic/extensions-builtin                
14:07:09-190990 INFO     Extension preload: 0.0s /opt/ai/automatic/extensions                        
14:07:09-202213 DEBUG    Memory used: 0.04 total: 31.23 Collected 0                                  
14:07:09-203258 DEBUG    Starting module: <module 'webui' from '/opt/ai/automatic/webui.py'>         
14:07:09-204049 INFO     Server arguments: ['--listen', '--use-rocm', '--debug', '--medvram']        
14:07:09-211962 DEBUG    Loading Torch                                                               
Speicherzugriffsfehler (Speicherabzug geschrieben)
rk@rkai:/opt/ai/automatic$ 

sdnext.log

2023-08-12 14:06:56,781 | sd | INFO | launch | Starting SD.Next
2023-08-12 14:06:56,786 | sd | INFO | installer | Python 3.10.12 on Linux
2023-08-12 14:06:56,802 | sd | INFO | installer | Version: 69eaf4c6 Sat Aug 12 08:32:19 2023 +0000
2023-08-12 14:06:57,162 | sd | DEBUG | installer | Setting environment tuning
2023-08-12 14:06:57,164 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=True ipex=False diml=False
2023-08-12 14:06:57,166 | sd | DEBUG | installer | Torch allowed: cuda=False rocm=True ipex=False diml=False
2023-08-12 14:06:57,233 | sd | DEBUG | installer | Repository update time: Sat Aug 12 10:32:19 2023
2023-08-12 14:06:57,234 | sd | INFO | installer | Verifying requirements
2023-08-12 14:06:57,248 | sd | INFO | installer | Verifying packages
2023-08-12 14:06:57,250 | sd | INFO | installer | Verifying repositories
2023-08-12 14:06:57,257 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/stable-diffusion-stability-ai / main
2023-08-12 14:06:57,724 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/taming-transformers / master
2023-08-12 14:06:58,159 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/k-diffusion / master
2023-08-12 14:06:58,997 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/BLIP / main
2023-08-12 14:06:59,466 | sd | INFO | installer | Verifying submodules
2023-08-12 14:06:59,872 | sd | DEBUG | installer | Submodule: extensions-builtin/a1111-sd-webui-lycoris / main
2023-08-12 14:06:59,885 | sd | DEBUG | installer | Submodule: extensions-builtin/clip-interrogator-ext / main
2023-08-12 14:06:59,896 | sd | DEBUG | installer | Submodule: extensions-builtin/multidiffusion-upscaler-for-automatic1111 / main
2023-08-12 14:06:59,906 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-dynamic-thresholding / master
2023-08-12 14:06:59,914 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-extension-system-info / main
2023-08-12 14:06:59,921 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-agent-scheduler / main
2023-08-12 14:06:59,928 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-controlnet / main
2023-08-12 14:06:59,958 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-images-browser / main
2023-08-12 14:06:59,966 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
2023-08-12 14:06:59,974 | sd | DEBUG | installer | Submodule: modules/lora / main
2023-08-12 14:06:59,981 | sd | DEBUG | installer | Submodule: modules/lycoris / main
2023-08-12 14:06:59,989 | sd | DEBUG | installer | Submodule: wiki / master
2023-08-12 14:07:00,071 | sd | DEBUG | installer | Installed packages: 220
2023-08-12 14:07:00,072 | sd | DEBUG | installer | Extensions all: ['sd-webui-agent-scheduler', 'clip-interrogator-ext', 'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet', 'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 'sd-extension-system-info', 'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']
2023-08-12 14:07:00,073 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-agent-scheduler/install.py
2023-08-12 14:07:00,265 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/clip-interrogator-ext/install.py
2023-08-12 14:07:08,117 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
2023-08-12 14:07:08,392 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/install.py
2023-08-12 14:07:08,714 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-extension-system-info/install.py
2023-08-12 14:07:08,894 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-images-browser/install.py
2023-08-12 14:07:09,180 | sd | DEBUG | installer | Extensions all: []
2023-08-12 14:07:09,181 | sd | INFO | installer | Extensions enabled: ['sd-webui-agent-scheduler', 'clip-interrogator-ext', 'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet', 'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 'sd-extension-system-info', 'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']
2023-08-12 14:07:09,182 | sd | INFO | installer | Verifying packages
2023-08-12 14:07:09,184 | sd | DEBUG | launch | Setup complete without errors: 1691842029
2023-08-12 14:07:09,190 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions-builtin
2023-08-12 14:07:09,190 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions
2023-08-12 14:07:09,202 | sd | DEBUG | launch | Memory used: 0.04 total: 31.23 Collected 0
2023-08-12 14:07:09,203 | sd | DEBUG | launch | Starting module: <module 'webui' from '/opt/ai/automatic/webui.py'>
2023-08-12 14:07:09,204 | sd | INFO | launch | Server arguments: ['--listen', '--use-rocm', '--debug', '--medvram']
2023-08-12 14:07:09,211 | sd | DEBUG | webui | Loading Torch

of course, no visible errors inside. If I could offer a guess: it sees the 32gb system memory and tries the 8GB VRAM with it

ConfusedMerlin commented 11 months ago

well, found it, kind of. Its the webui.py, this line:

rnd = torch.sum(torch.randn(2, 2)).to(0)

this is reproducible , as this produces the same error:

python -c "import torch; print(torch.sum(torch.randn(2, 2)).to(0))"

lets see if I can figure out whats wrong here

evshiron commented 11 months ago

@ConfusedMerlin

Greetings. Would you mind following the steps here and see if it works?

This line:

rnd = torch.sum(torch.randn(2, 2)).to(0)

is from https://github.com/vladmandic/automatic/issues/1929.

But if your RX 7600 core dumped at this line, you might want to export these two environment variables before calling ./webui.sh:

export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
ConfusedMerlin commented 11 months ago

@evshiron , oh my dear god, it finally starts and shows the GPU in the system tab! I get about 5.5 It/sec, comared to the 1.7 I get in windows. That is remarkable! (using your are-we-gfx1100-yet fork)

I red up about that memory error; it seems, that this short call here: "torch.sum(torch.randn(2, 2)).to(0)" tries to stuff a Dozend Gigabyte of tensors worth into the VRAM, which only has 8 to offer. I wonder If this would work by commenting out that line simply... Oh, the "need to be video and render group" applies to your fork too, as it seems.

EDiT: So... to sum up:

Shall I close it, or is there something we can do to increase the "lessons learned" from this issue?

evshiron commented 11 months ago

@ConfusedMerlin

It shouldn't. It creates a 2x2 matrix with randomness on your GPU, and sum them.

This line was added to ensure torch initializes early, to avoid another core dump caused by tensorflow-rocm for our Navi 3x GPUs.

# this is required if you have multiple gpus and you want to use the first one
export HIP_VISIBLE_DEVICES=0
# this is required for apps to correctly recognize navi 3x gpus
export HSA_OVERRIDE_GFX_VERSION=11.0.0

This two lines should be used in most situations, for example:

export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
python3 -c "import torch; print(torch.sum(torch.randn(2, 2)).to(0))"

Will work as soon as you add them.

Would you mind going to the "System Info" tab and run the benchmark SEVERAL TIMES and submit it? It would be great to see the performance of various Navi 3x GPUs.

Nowadays, generating images is well optimized and should not require too much VRAM. If there are larger images that you can't generate, you can use the included Tiled Diffusion and Tiled VAE extension and tweak the parameters to make it work on your GPU (at the cost of performance).

If you find your GPU malfunctions, check dmesg. If there are plenty of errors coming out from the AMDGPU driver, your GPU will not recover without a reboot.

ConfusedMerlin commented 11 months ago

@evshiron , on it.... I must ask, how do I know if the submit was successful? I mean, I can hit "Submit Results", but nothing seems to happen. The results page seems to be.... already in the future?

evshiron commented 11 months ago

@ConfusedMerlin

Every time you click, the WebUI will print a log in the console about how many records have been submitted.

The benchmark data is collected with a logstash service, you can submit several times to make sure they are submitted successfully. The submitted records will be automatically deduplicated.

You will be able to see the records you submit in an hour, because there is a hourly worker to update the benchmark data from collected records.

ConfusedMerlin commented 11 months ago

@evshiron , I see.

There is one thing I just noticed: switching from one checkpoint model to another one takes way longer than in the Windows Automatic1111. There is was a matter of a couple of seconds (at least it seemed this way), here it needs 30 seconds and more. Is that normal?

evshiron commented 11 months ago

@ConfusedMerlin

My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment.

It was faster when HSA_OVERRIDE_GFX_VERSION wasn't used in the early day. But now I get used to it.

30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe --medvram will help?

ConfusedMerlin commented 11 months ago

@evshiron , --medvram just... causes no checkpoint to be loaded at all, but without any error message. I select it, and nothing happens. Hit "reload" in the settings, nothing.

Well, I guess I can live with the longer switching times for now (it sill is weird).

evshiron commented 11 months ago

@ConfusedMerlin

According to https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations, --medvram:

Makes the Stable Diffusion model consume less VRAM by splitting it into three parts - cond (for transforming text into numerical representation), first_stage (for converting a picture into latent space and back), and unet (for actual denoising of latent space) and making it so that only one is in VRAM at all times, sending others to CPU RAM. Lowers performance, but only by a bit - except if live previews are enabled.

In case you miss it, you should Ctrl+C and ./webui.sh again after enabling --medvram.

But if still nothing gonna happen, you might want to check if there is something weird in dmesg.

Btw, there are "Token Merge Ratio" settings which can be set to 0.3-0.5 to gain some performance boost without affecting the outputs too much.

ConfusedMerlin commented 11 months ago

@evshiron I strg+C out of it and applied medvram. This time at least something did happen:

Progress 0.1it/s ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:03:22

loading the initial safetensor (not checkpoint, if that may of relevance) took this long. then my prompt (test) was spit out like immediately.

Now I switched to another checkpoint, which seems to take even longer to load than before.

dmesg show nothing of interest; the last entries are 30 minutes old and about the memory access violation.

EDIT: And it is finished loading:

16:36:16-131017 INFO Model loaded in 176.7s (load=1.7s config=69.7s create=0.5s apply=0.8s
vae=104.0s)

evshiron commented 11 months ago

@ConfusedMerlin

Hmmm. I am out of ideas now.

Model loaded in 176.7s (load=1.7s config=69.7s create=0.5s apply=0.8s vae=104.0s)

There must be something worth investigating, but it is beyond my abilities.

ConfusedMerlin commented 11 months ago

@evshiron

It is like the total opposite to my windows automatic1111 experience. There, the model switches in like 4 seconds, but it onnly get up to 1.5 it/s if I don't do anything fancy. Also, I must run it with --medvram, otherwise it crashes the gpu driver.

Here, the model switch takes an eternity, but once that is done, it spits out pictures with 6 it/s. And while loading the new image, the system actually lags. But neither cpu nor gpu are really used... but I observed, that it takes way more system RAM and less VRAM than on windows, where it claims 95% of my VRAM and a couple of GB system RAM.

Still, stuff does happen. For now, that is enough.

Thank you all for you help, hints and suggestions!

ConfusedMerlin commented 11 months ago

for me, it seemed that the latest GPU driver, the correct torch-version (rocm!) and - critically underrated - my user being in the correct groups (render and video) was necessary to get this to work. Calling rocm-info as non-root user should not create ANY error message, because you need to be in order to be able to use any processes that want to do something with the gpu.

The model loading stuff is something on its own, which should not be in an issue about something else.

evshiron commented 11 months ago

@ConfusedMerlin

Enabling Tiled VAE might help with the lag at the end of image generation. It might not be used if the image size is small, and you can reduce Decoder Tile Size (from 64 upwards to find a sweet point for your GPU) to have it applied, if needed.

evshiron commented 11 months ago

@ConfusedMerlin

I come up with some ideas. Could you try these when you have time?

ConfusedMerlin commented 11 months ago

@evshiron , I ran some tests; at one point, the os just froze. And after that reboot, using no custom parameters brought the loading times below the 10 second mark again. For now, I assume that something really fishy got stuck in the vram (I know, that isn't even close to how this works, but a good old crash'n'reboot may has fixed it for now).

If I find a way to reproduce that, I will come back here and open a new issue, okay?

evshiron commented 11 months ago

@ConfusedMerlin

That's good news.

It's OK. Remember to mention me when new issue comes out.

vladmandic commented 11 months ago

@ConfusedMerlin

My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment.

It was faster when HSA_OVERRIDE_GFX_VERSION wasn't used in the early day. But now I get used to it.

30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe --medvram will help?

where are models actually residing (and what type of a filesystem)? fast load relies on memory mapping of safetensors file and some filesystems are really bad with that, so you actually get better results if you switch to stream load method (in settings)

vladmandic commented 11 months ago

Greetings. Would you mind following the steps here and see if it works?

This line:

rnd = torch.sum(torch.randn(2, 2)).to(0)

is from #1929.

But if your RX 7600 core dumped at this line, you might want to export these two environment variables before calling ./webui.sh:

export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0

@evshiron today i've first heard of this fork and some of the stuff really makes sense - can we get in direct contact (e.g. are you on discord)?

evshiron commented 11 months ago

@vladmandic

Greetings. That's the fork I was working on when we were handling https://github.com/vladmandic/automatic/issues/1929.

It's still far from complete, but at least it works out of the box for Navi 3x users running ROCm 5.5+.

I can make a simplified PR for only Navi 3x (with some if/else) if you need.

Btw, we might be able to eliminate the hack from https://github.com/vladmandic/automatic/issues/1929, if we don't set os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow-rocm') for Navi 3x.

The list of gfxXXX and which HSA_OVERRIDE_GFX_VERSION should be used can be found here (search "GCN GFX10.3" for eaxmple), if you want to extend the strategy for Navi 2x (RDNA 2) dGPUs (my iGPU reported as gfx1036 doesn't work). CDNA GPUs should work without it.

ConfusedMerlin commented 11 months ago

@ConfusedMerlin My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment. It was faster when HSA_OVERRIDE_GFX_VERSION wasn't used in the early day. But now I get used to it. 30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe --medvram will help?

where are models actually residing (and what type of a filesystem)? fast load relies on memory mapping of safetensors file and some filesystems are really bad with that, so you actually get better results if you switch to stream load method (in settings)

they are on an ext-4 fs, that also hosts about... everything else. I know, not the best setup, But after a dozen or so not quite working manuel partition setups I just stuffed everything on one big fs.

EDiT: would that memmap/stream not be kind of reproducable with ease? Yesterday evening I actually failed to load anything even if it was trying. At this time, even generation also slowed down to 1 - 2 it/s. This morning nothing is remaining of that.... switching models takes a couple of seconds, and generation breezes through with about 7 it/s. ... I should really make a new issue for that, shouldn't I? But... in the other forks issue section.

vladmandic commented 11 months ago

@evshiron

I can make a simplified PR for only Navi 3x (with some if/else) if you need.

My goal is always to have as good as possible out-of-the-box solution, so anything you can contribute - its appreciated. I don't have AMD GPU, so i rely on community for most of it.

Btw, we might be able to eliminate the hack from https://github.com/vladmandic/automatic/issues/1929, if we don't set os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow-rocm') for Navi 3x.

I was thinking the same - do we actually need tensorflow-rocm at all?

The list of gfxXXX and which HSA_OVERRIDE_GFX_VERSION should be used can be found here (search "GCN GFX10.3" for eaxmple), if you want to extend the strategy for Navi 2x (RDNA 2) dGPUs (my iGPU reported as gfx1036 doesn't work). CDNA GPUs should work without it.

Absolutely - see #1972 - I'm ready to merge as soon as PR is ready. perhaps you can work with @Aptronymist to create it?