These flags could be put into the code but I thought I'd make them visible so that you and others could decide if they are needed for their setup.

I don't have a PRIME laptop, so unfortunately I can't test this, but thanks for the PR!

Just one question: what's nvidia-offload? I can't find it referenced anywhere online, and no package in Nixpkgs seems to provide it.

This has been tested and works on my machine. Figured it may help others as it took me down a rabit hole.

I forgot to mention that nvidia-offload is the default command that many would use if they have Nvidia Prime. Comes from github.com/NixOS/nixos-hardware (very popular for laptop users) May help to reference that Nvidia Prime and/or those git pkgs.

I'm trying to reformat these commands, and I'm not so sure that --precision full --no-half is required. Does it work on your laptop if you use NO_TCMALLOC=True nvidia-offload bash ./webui.sh --skip-torch-cuda-test?

I’m not able to test it right now. Which file were you planning to but the Commands into? nvidia-offload is probably best left out of your code and only as an example with a reference to Nvidia Prime as it could throw errors quickly. The other commands are more safe in this use case as I was informed this software doesnt use the cpu? I could be wrong. I was being thrown errors about ‘no half’ and came to the forums to see what others were doing. I couldnt find nix pkgs that added the tcmalloc so maybe that’s a personal problem too? I’m new to NixOS but have learned a lot about it over the last month getting different software to run on it. On May 21, 2023, at 11:40 AM, virchau13 @.***> wrote: I'm trying to reformat these commands, and I'm not so sure that --precision full --no-half is required. Does it work on your laptop if you use NO_TCMALLOC=True nvidia-offload bash ./webui.sh --skip-torch-cuda-test?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

I believe NixOS needs a pkgs from gperftools (includes the missing TCMalloc) . I did read that -no-half flag will actually make the process slower so it's much better to get this flake loaded with the missing TCMalloc.

As for my Nvidia Prime, I don't believe i'm turning it on in the proper command as I'm 90% the slow speed of the Imagine generation, is due to it using the integrated graphics.

I'm trying to help out but it appears this is just becoming a bug report as I don't know where in Nixos I should be adding the pkgs or invoking nvidia-offload. I'm new to linux and not the best of programmers.

On Sun, May 21, 2023 at 12:49 PM Justin Gilpin @.***> wrote:

I’m not able to test it right now. Which file were you planning to but the Commands into? nvidia-offload is probably best left out of your code and only as an example with a reference to Nvidia Prime as it could throw errors quickly.

The other commands are more safe in this use case as I was informed this software doesnt use the cpu? I could be wrong. I was being thrown errors about ‘no half’ and came to the forums to see what others were doing. I couldnt find nix pkgs that added the tcmalloc so maybe that’s a personal problem too? I’m new to NixOS but have learned a lot about it over the last month getting different software to run on it.

On May 21, 2023, at 11:40 AM, virchau13 @.***> wrote:

I'm trying to reformat these commands, and I'm not so sure that --precision full --no-half is required. Does it work on your laptop if you use NO_TCMALLOC=True nvidia-offload bash ./webui.sh --skip-torch-cuda-test?

— Reply to this email directly, view it on GitHub https://github.com/virchau13/automatic1111-webui-nix/pull/5#issuecomment-1556074434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH6GK6TCKSG56H37XQBNIODXHGFDNANCNFSM6AAAAAAYJDPL4Q . You are receiving this because you authored the thread.Message ID: @.***>

-- Justin Gilpin 678-250-4077 https://goinglocal.cc/ https://goinglocal.cc/

Yeah, PRIME is a pain at the best of times.

I'm taking a look at the webui, and it seems to support selection of one among multiple graphics cards using --device-id. Can you try running the following and check if it works?

nix-shell
NO_TCMALLOC=True ./webui.sh --medvram --no-half --device-id 0 # The 'device-id' arg is the important bit here

If that doesn't work, can you post the output of the following Python code (inside the nix-shell)? This should list all available CUDA devices that PyTorch sees.

import torch
for i in range(torch.cuda.device_count()):
   print(i, ':', torch.cuda.get_device_properties(i).name)

I'm still having to add --skip-torch-cuda-test

~ nvidia-offload nix-shell ~ NO_TCMALLOC=True ./webui.sh --skip-torch-cuda-test --no-half --medvram --device-id 0

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't being

offloaded properly. Tempted to just turn nvidia always one. Unfortunately isn't not common to have a switch to turn it off and on, but has to be invoked for a program.

debugging torch gets return torch._C._cuda_getDeviceCount() > 0

I'm going to rebuild nix with nvidia always on and msg you back

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 Installing requirements Launching Web UI with arguments: --skip-torch-cuda-test --medvram --device-id 0 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count No module 'xformers'. Proceeding without it. Warning: caught exception 'Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination', memory monitor disabled Loading weights [6ce0161689] from /home/justin/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 4.1s (import torch: 1.1s, import gradio: 1.0s, import ldm: 0.4s, other imports: 0.5s, load scripts: 0.4s, create ui: 0.5s, gradio launch: 0.2s). Creating model from config: /home/justin/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Applying cross attention optimization (InvokeAI). Textual inversion embeddings loaded(0): Model loaded in 2.3s (load weights from disk: 0.7s, create model: 0.4s, apply weights to model: 0.6s, apply half(): 0.5s).

Debugging torch

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count Traceback (most recent call last): File "/home/justin/stable-diffusion-webui/launch.py", line 377, in

prepare_environment() File "/home/justin/stable-diffusion-webui/launch.py", line 282, in prepare_environment run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'") File "/home/justin/stable-diffusion-webui/launch.py", line 135, in run_python return run(f'"{python}" -c "{code}"', desc, errdesc) File "/home/justin/stable-diffusion-webui/launch.py", line 111, in run raise RuntimeError(message) RuntimeError: Error running command. Command: "/home/justin/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'" Error code: 1 stdout: stderr: /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "", line 1, in AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check On Mon, May 22, 2023 at 10:27 PM virchau13 ***@***.***> wrote: > Yeah, PRIME is a pain at the best of times. > > I'm taking a look at the webui, and it seems to support selection of one > among multiple graphics cards using --device-id. Can you try running the > following and check if it works? > > nix-shell > NO_TCMALLOC=True ./webui.sh --medvram --no-half --device-id 0 # The 'device-id' arg is the important bit here > > If that doesn't work, can you post the output of the following Python code > (inside the nix-shell)? This should list all available CUDA devices that > PyTorch sees. > > import torchfor i in range(torch.cuda.device_count()): > print(i, ':', torch.cuda.get_device_properties(i).name) > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- Justin Gilpin 678-250-4077 *https://goinglocal.cc/ *

Before turning Nvidia always on, i found this about cuda on the nixos nvidia wiki https://nixos.wiki/wiki/Nvidia. I'm not familiar enough with where to put this single value at.

Please note that, if you are setting up PRIME offloading, you must set the single value of "nvidia" even though it would be more conceptually correct to also include the driver for your other GPU. Doing otherwise will cause a broken xorg.conf to be generated. This is because NixOS doesn't actually handle multiple GPUs / GPU drivers properly, as per https://github.com/NixOS/nixpkgs/issues/108018.

On Tue, May 23, 2023 at 4:44 AM Justin Gilpin @.***> wrote:

I'm still having to add --skip-torch-cuda-test

~ nvidia-offload nix-shell ~ NO_TCMALLOC=True ./webui.sh --skip-torch-cuda-test --no-half --medvram --device-id 0

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't

being offloaded properly. Tempted to just turn nvidia always one. Unfortunately isn't not common to have a switch to turn it off and on, but has to be invoked for a program.

debugging torch gets return torch._C._cuda_getDeviceCount() > 0

I'm going to rebuild nix with nvidia always on and msg you back

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 Installing requirements Launching Web UI with arguments: --skip-torch-cuda-test --medvram --device-id 0 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count No module 'xformers'. Proceeding without it. Warning: caught exception 'Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination', memory monitor disabled Loading weights [6ce0161689] from /home/justin/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 4.1s (import torch: 1.1s, import gradio: 1.0s, import ldm: 0.4s, other imports: 0.5s, load scripts: 0.4s, create ui: 0.5s, gradio launch: 0.2s). Creating model from config: /home/justin/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Applying cross attention optimization (InvokeAI). Textual inversion embeddings loaded(0): Model loaded in 2.3s (load weights from disk: 0.7s, create model: 0.4s, apply weights to model: 0.6s, apply half(): 0.5s).

Debugging torch

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count Traceback (most recent call last): File "/home/justin/stable-diffusion-webui/launch.py", line 377, in
prepare_environment() File "/home/justin/stable-diffusion-webui/launch.py", line 282, in prepare_environment run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'") File "/home/justin/stable-diffusion-webui/launch.py", line 135, in run_python return run(f'"{python}" -c "{code}"', desc, errdesc) File "/home/justin/stable-diffusion-webui/launch.py", line 111, in run raise RuntimeError(message) RuntimeError: Error running command. Command: "/home/justin/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'" Error code: 1 stdout: stderr: /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "", line 1, in AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check On Mon, May 22, 2023 at 10:27 PM virchau13 ***@***.***> wrote: > Yeah, PRIME is a pain at the best of times. > > I'm taking a look at the webui, and it seems to support selection of one > among multiple graphics cards using --device-id. Can you try running the > following and check if it works? > > nix-shell > NO_TCMALLOC=True ./webui.sh --medvram --no-half --device-id 0 # The 'device-id' arg is the important bit here > > If that doesn't work, can you post the output of the following Python > code (inside the nix-shell)? This should list all available CUDA devices > that PyTorch sees. > > import torchfor i in range(torch.cuda.device_count()): > print(i, ':', torch.cuda.get_device_properties(i).name) > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- Justin Gilpin 678-250-4077 *https://goinglocal.cc/ *

-- Justin Gilpin 678-250-4077 https://goinglocal.cc/ https://goinglocal.cc/

I think I'm have a cuda problem more than I am with my graphics. I've enabled it as always on and not much has changed. Between that and --no-half I think it's making the software run quite slow. Will continue to do testing and see what I can come up with.

On Tue, May 23, 2023 at 4:48 AM Justin Gilpin @.***> wrote:

Before turning Nvidia always on, i found this about cuda on the nixos nvidia wiki https://nixos.wiki/wiki/Nvidia. I'm not familiar enough with where to put this single value at.

Please note that, if you are setting up PRIME offloading, you must set the single value of "nvidia" even though it would be more conceptually correct to also include the driver for your other GPU. Doing otherwise will cause a broken xorg.conf to be generated. This is because NixOS doesn't actually handle multiple GPUs / GPU drivers properly, as per https://github.com/NixOS/nixpkgs/issues/108018.

On Tue, May 23, 2023 at 4:44 AM Justin Gilpin @.***> wrote:

I'm still having to add --skip-torch-cuda-test

~ nvidia-offload nix-shell ~ NO_TCMALLOC=True ./webui.sh --skip-torch-cuda-test --no-half --medvram --device-id 0

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't

being offloaded properly. Tempted to just turn nvidia always one. Unfortunately isn't not common to have a switch to turn it off and on, but has to be invoked for a program.

debugging torch gets return torch._C._cuda_getDeviceCount() > 0

I'm going to rebuild nix with nvidia always on and msg you back

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 Installing requirements Launching Web UI with arguments: --skip-torch-cuda-test --medvram --device-id 0 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count No module 'xformers'. Proceeding without it. Warning: caught exception 'Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination', memory monitor disabled Loading weights [6ce0161689] from /home/justin/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 4.1s (import torch: 1.1s, import gradio: 1.0s, import ldm: 0.4s, other imports: 0.5s, load scripts: 0.4s, create ui: 0.5s, gradio launch: 0.2s). Creating model from config: /home/justin/stable-diffusion-webui/configs/v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Applying cross attention optimization (InvokeAI). Textual inversion embeddings loaded(0): Model loaded in 2.3s (load weights from disk: 0.7s, create model: 0.4s, apply weights to model: 0.6s, apply half(): 0.5s).

Debugging torch

Python 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 11.3.0] Version: v1.2.1 Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045 /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:651: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count Traceback (most recent call last): File "/home/justin/stable-diffusion-webui/launch.py", line 377, in
prepare_environment() File "/home/justin/stable-diffusion-webui/launch.py", line 282, in prepare_environment run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'") File "/home/justin/stable-diffusion-webui/launch.py", line 135, in run_python return run(f'"{python}" -c "{code}"', desc, errdesc) File "/home/justin/stable-diffusion-webui/launch.py", line 111, in run raise RuntimeError(message) RuntimeError: Error running command. Command: "/home/justin/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'" Error code: 1 stdout: stderr: /home/justin/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "", line 1, in AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check On Mon, May 22, 2023 at 10:27 PM virchau13 ***@***.***> wrote: > Yeah, PRIME is a pain at the best of times. > > I'm taking a look at the webui, and it seems to support selection of one > among multiple graphics cards using --device-id. Can you try running > the following and check if it works? > > nix-shell > NO_TCMALLOC=True ./webui.sh --medvram --no-half --device-id 0 # The 'device-id' arg is the important bit here > > If that doesn't work, can you post the output of the following Python > code (inside the nix-shell)? This should list all available CUDA > devices that PyTorch sees. > > import torchfor i in range(torch.cuda.device_count()): > print(i, ':', torch.cuda.get_device_properties(i).name) > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- Justin Gilpin 678-250-4077 *https://goinglocal.cc/ *

-- Justin Gilpin 678-250-4077 https://goinglocal.cc/ https://goinglocal.cc/

-- Justin Gilpin 678-250-4077 https://goinglocal.cc/ https://goinglocal.cc/

virchau13 / automatic1111-webui-nix

Update README.md #5

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't being

debugging torch gets return torch._C._cuda_getDeviceCount() > 0

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't

debugging torch gets return torch._C._cuda_getDeviceCount() > 0

no errors about tcmalloc

performance is about the same and I think it's because nvidia isn't

debugging torch gets return torch._C._cuda_getDeviceCount() > 0