virchau13 / automatic1111-webui-nix

AUTOMATIC1111/stable-diffusion-webui for CUDA and ROCm on NixOS
MIT License
139 stars 19 forks source link

Getting Torch is not able to use GPU error #6

Open juh9870 opened 12 months ago

juh9870 commented 12 months ago

I have pulled the latest https://github.com/AUTOMATIC1111/stable-diffusion-webui, added all nix files from this repo to the SD folder, ran nix-shell and waited for it to finish. Then I ran ./webui.sh and it installed dependencies of SD ui, and then I got this error:

################################################################
Launching launch.py...
################################################################
ldconfig: Can't open cache file /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/etc/ld.so.cache
: No such file or directory
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.12 (main, Jun  6 2023, 22:43:10) [GCC 12.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Traceback (most recent call last):
  File "/home/juh9870/games/StableDiffusion/launch.py", line 48, in <module>
    main()
  File "/home/juh9870/games/StableDiffusion/launch.py", line 39, in main
    prepare_environment()
  File "/home/juh9870/games/StableDiffusion/modules/launch_utils.py", line 356, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

I have a 3080 GPU on my machine and I managed to run Auto1111 UI before switching to NixOS

RikudouSage commented 11 months ago

Seems like you don't have the hardware drivers for your GPU. You need to install those.

juh9870 commented 11 months ago

I have drivers installed via configuration.nix, I'm using 23.05 channel, and other apps (like Blender) work fine with my GPU

RikudouSage commented 11 months ago

Are you sure you're using the driver by Nvidia?

juh9870 commented 11 months ago

I'm on nixos and I followed the guide on using Nvidia drivers. https://nixos.wiki/wiki/Nvidia I'm not using a laptop and I don't have other GPUs installed

JeremyKennedy commented 11 months ago

I resolved this by ensuring my NixOS install was using the latest NVIDIA driver. I had to sudo nixos-rebuild switch --upgrade. You can check the version with nvidia-smi. It should match when you do it inside vs outside the nix develop shell.

Itrekr commented 10 months ago

Having the same issue with the same GPU. Inside the develop shell there is a library version mismatch and outside of it it appears to be fine. I'm extremely new to NixOS so I'm not quite sure how to solve such a thing. Did you manage to fix it @juh9870 ?

juh9870 commented 10 months ago

I'm using this for now: https://github.com/AbdBarho/stable-diffusion-webui-docker Had to enable GPU use in docker, but otherwise it's pretty straightforward

LiquidZulu commented 10 months ago

I am also getting this problem, updating did not work. I am installing the nvidia drivers in my system flake like so:

{ config, lib, pkgs, ... }:

{
  # See https://nixos.wiki/wiki/Nvidia
  services.xserver.videoDrivers = [
    "nvidia" # https://github.com/NixOS/nixpkgs/issues/80936#issuecomment-1003784682
  ];

  hardware = {
    opengl = {
      enable = true;
      driSupport = true;
      driSupport32Bit = true;
    };

    nvidia = {

      # Modesetting is needed for most wayland compositors
      modesetting.enable = true;

      # Use the open source version of the kernel module
      open = true;

      # Enable the nvidia settings menu
      nvidiaSettings = true;

      # Optionally, you may need to select the appropriate driver version for your specific GPU.
      package = config.boot.kernelPackages.nvidiaPackages.stable;
    };
  };
}

You can check the version with nvidia-smi.

@JeremyKennedy is there a specific package which provides this? I do not have it on my system and cannot find it on https://search.nixos.org/packages?channel=23.05&from=0&size=50&sort=relevance&type=packages&query=nvidia-smi

jpentland commented 7 months ago

I have the same error, but I get this message when using 'nvidia-smi':

Without the flake enabled:

Thu Jan 11 16:06:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:09:00.0  On |                  N/A |
| 53%   42C    P3              37W / 170W |   2568MiB / 12288MiB |     35%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+

With the flake enabled:

Failed to initialize NVML: Driver/library version mismatch

I also tried copying the sha256 revision from the 'flake.lock' of my system config flake to the flake.lock used by this project, in order to try to make the versions match, but I still seem to get the same error.

edit: And yes, I have rebooted since the last time running "nixos-rebuild switch"

wyndon commented 6 months ago

This error is because the driver version isn't the same between the one you use on your system config and the one pulled up from the devShell. NVIDIA is very strict regarding versions, it should be the exact same.

It either means you're using a different variant (if you're using the beta drivers for example, by default the devShell from this flake pulls up the regular version), or it could also mean the flake.lock from the repo is outdated, but it shouldn't, because in the README you're not supposed to copy the flake.lock anyways, so nix generates a new one with the latest versions available. It could also come from the fact that you're using a release channel (e.g. 23.11) instead of unstable, which could also causes versions mismatch.

To fix it you just need to use the drivers from unstable, or edit the flake to use a release channel. And if you're using a variant (e.g. the beta one) you need to edit the devShell.

linuxPackages.nvidia_x11_beta

jpentland commented 6 months ago

I have this in my system configuration:

  services.xserver.videoDrivers = [ "nvidia" ];
  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };

I'm not sure which underlying package that corresponds to.

edit: I have managed to solve the issue, by doing what I already mentioned above and copying my systems nixpkgs section of flake.lock into the projects flake.lock. Not sure why it wasn't working before.

UlyssesZh commented 6 months ago

I solved this by replacing nixos-unstable with nixos-23.11 in flake.nix.

jasper-at-windswept commented 4 months ago

Can anyone help me with this, I"m on unstable 24.05 and I use a custom nvidia package, but I'm not sure how to port it to the flake.nix!

hardware.nvidia.package = let
  rcu_patch = pkgs.fetchpatch {
    url = "https://github.com/gentoo/gentoo/raw/c64caf53/x11-drivers/nvidia-drivers/files/nvidia-drivers-470.223.02-gpl-pfn_valid.patch";
    hash = "sha256-eZiQQp2S/asE7MfGvfe6dA/kdCvek9SYa/FFGp24dVg=";
  };
  in config.boot.kernelPackages.nvidiaPackages.mkDriver {
    version = "535.154.05";
    sha256_64bit = "sha256-fpUGXKprgt6SYRDxSCemGXLrEsIA6GOinp+0eGbqqJg=";
    sha256_aarch64 = "sha256-G0/GiObf/BZMkzzET8HQjdIcvCSqB1uhsinro2HLK9k=";
    openSha256 = "sha256-wvRdHguGLxS0mR06P5Qi++pDJBCF8pJ8hr4T8O6TJIo=";
    settingsSha256 = "sha256-9wqoDEWY4I7weWW05F4igj1Gj9wjHsREFMztfEmqm10=";
    persistencedSha256 = "sha256-d0Q3Lk80JqkS1B54Mahu2yY/WocOqFFbZVBh+ToGhaE=";
    patches = [ rcu_patch ];
  };

*Edit I ended up just using the docker version with virtualization.docker.enableNvidia

BenMac31 commented 4 months ago

Since I already had cudatoolkit on the host system, simply removing the cudatoolkit package from the impl.nix CUDA variant fixed the issue for me.

Rexcrazy804 commented 3 months ago

Since I already had cudatoolkit on the host system, simply removing the cudatoolkit package from the impl.nix CUDA variant fixed the issue for me.

this worked for me as well, additionally you can run nvidia-smi inside the shell and it would report a cuda version error from the looks of it the current cuda version is 12.4 but the the latest manifest at nixpkgs is 12.3 so its an upstream issue. for the time being this workaround works like a charm :smile:

natervader commented 3 months ago

I solved this by replacing nixos-unstable with nixos-23.11 in flake.nix.

This did the trick for me. Thanks!