vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.36k stars 382 forks source link

[Issue]: WARNING Torch BF16 test failed: Fallback to FP16 operations #1439

Closed 1r00t closed 1 year ago

1r00t commented 1 year ago

Issue Description

I have a 4090 and before I could use BF16 without problem, which gave me a performance boost over FP16. Now I get this warning and this in the device info section:

grafik

While having these settings:

grafik

12:32:40-725716 INFO     Starting SD.Next
12:32:40-727803 INFO     Python 3.10.12 on Linux
12:32:40-729746 INFO     Version: b48a0f13 Thu Jun 15 19:09:29 2023 -0400
12:32:40-848678 INFO     nVidia CUDA toolkit detected
12:32:41-580828 INFO     Torch 2.0.1+cu118
12:32:41-696211 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
12:32:41-721437 INFO     Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
12:32:41-727557 WARNING  Modified files: ['modules/lora']
12:32:41-729659 INFO     Enabled extensions-builtin: ['stable-diffusion-webui-images-browser',
                         'sd-webui-model-converter', 'seed_travel', 'sd-webui-agent-scheduler',
                         'sd-extension-aesthetic-scorer', 'clip-interrogator-ext', 'ScuNET',
                         'stable-diffusion-webui-rembg', 'sd-extension-system-info', 'SwinIR', 'a1111-sd-webui-lycoris',
                         'sd-extension-steps-animation', 'LDSR', 'Lora', 'sd-dynamic-thresholding',
                         'sd-webui-controlnet', 'multidiffusion-upscaler-for-automatic1111']
12:32:41-731093 INFO     Enabled extensions: ['sd-webui-aspect-ratio-helper', 'sd-webui-3d-open-pose-editor',
                         'canvas-zoom', 'ultimate-upscale-for-automatic1111']
12:32:41-731955 INFO     No changes detected: Quick launch active
12:32:41-733475 INFO     Extension preload: 0.0s /home/kroko/projects/vlad/extensions-builtin
12:32:41-734083 INFO     Extension preload: 0.0s /home/kroko/projects/vlad/extensions
12:32:41-739324 INFO     Server arguments: []
No module 'xformers'. Proceeding without it.
12:32:45-045761 INFO     Libraries loaded
12:32:45-047931 INFO     Using data path: /home/kroko/projects/vlad
12:32:45-049366 INFO     Available VAEs: /home/kroko/projects/stable-diffusion/models/VAE 1
12:32:45-051797 INFO     Available models: /home/kroko/projects/stable-diffusion/models/Stable-diffusion 18
12:32:45-957305 INFO     ControlNet v1.1.224
ControlNet preprocessor location: /home/kroko/projects/vlad/extensions-builtin/sd-webui-controlnet/annotator/downloads
12:32:46-076978 INFO     ControlNet v1.1.224
12:32:47-062897 INFO     Loading UI theme: name=gradio/default style=Dark
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
12:32:48-104647 INFO     Local URL: http://127.0.0.1:7860/
12:32:48-105446 INFO     Initializing middleware
12:32:48-225722 INFO     [AgentScheduler] Task queue is empty
12:32:48-226470 INFO     [AgentScheduler] Registering APIs
Loading weights: /home/kroko/projects/stable-diffusion/models/Stable-diffusion/revAnimated_v122.safetensors ━━━ 0.… -:-…
                                                                                                                GB
12:32:49-030387 WARNING  Torch BF16 test failed: Fallback to FP16 operations
12:32:49-046951 INFO     Setting Torch parameters: dtype=torch.float16 vae=torch.float16 unet=torch.float16
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading weights: /home/kroko/projects/stable-diffusion/models/VAE/vae-ft-mse-840000-ema-pruned.safetensors ━━━ 0.0… -:-…
                                                                                                               MB
12:32:50-709406 INFO     Applying scaled dot product cross attention optimization
12:32:50-728970 INFO     Embeddings: loaded=7 skipped=4
12:32:50-733725 INFO     Model loaded in 2.1s (load=0.4s create=0.3s apply=0.7s vae=0.4s move=0.3s)
12:32:50-981344 INFO     Model load finished: {'ram': {'used': 8.91, 'total': 15.58}, 'gpu': {'used': 3.61, 'total':
                         23.99}, 'retries': 0, 'oom': 0} cached=0
12:32:51-022424 INFO     Startup time: 9.3s (torch=1.8s gradio=0.7s libraries=0.9s scripts=1.8s onchange=0.2s ui=0.9s
                         launch=0.1s app-started=0.3s checkpoint=2.6s)

here is sdnext.log. setup.log did not have any new entries: sdnext.log

Version Platform Description

arch: x86_64 cpu: AMD Ryzen 7 5800X gpu: RTX4090 system: Linux os: Ubuntu 20.04.6 LTS release: 5.15.90.1-microsoft-standard-WSL2 python: 3.10.12 browser: Firefox 114.0.1 (64-Bit)

Acknowledgements

vladmandic commented 1 year ago

previously when you selected bfloat16, it was not really used at all. now it is, but i'm yet to see a full implementation of it - at the bare minimum, torch 2.1 nightly is needed as torch 2.0.1 does not implement all required ops.

if you think it works, you can bypass the checks using --experimental command line flag, but otherwise this works as designed.

1r00t commented 1 year ago

thanks for clearing that up for me. indeed it was not using BF16.