hasktorch / hasktorch

Tensors and neural networks in Haskell
http://hasktorch.org
Other
1.04k stars 104 forks source link

Missing configurations on NixOS #712

Open kenhkan opened 2 weeks ago

kenhkan commented 2 weeks ago

Following the NixOS setup instructions and adding the missing NIX_CFLAGS_COMPILE environment variable, I got this gcc error. What else would be missing from the README that I would need to do before running cabal build?

[kenhkan@kenitrain:~/hasktorch]$ nix develop

[kenhkan@kenitrain:~/hasktorch]$ export NIX_CFLAGS_COMPILE=/nix/store/znipp49fldq5waw9n45fmynh4wkpnrbn-libtorch-2.3.0-dev/include/torch/csrc/api/include

[kenhkan@kenitrain:~/hasktorch]$ cabal build hasktorch
Build profile: -w ghc-9.6.5 -O1
In order, the following will be built (use -v for more details):
 - libtorch-ffi-2.0.0.0 (lib) (first run)
 - hasktorch-0.2.0.0 (lib:hasktorch, test:doctests, test:spec) (first run)
Preprocessing library for libtorch-ffi-2.0.0.0..
Building library for libtorch-ffi-2.0.0.0..
gcc: fatal error: cannot specify ‘-o’ with ‘-c’, ‘-S’ or ‘-E’ with multiple files
compilation terminated.
gcc: fatal error: cannot specify ‘-o’ with ‘-c’, ‘-S’ or ‘-E’ with multiple files
compilation terminated.

src/Torch/Internal/GC.hs:1:1: error:
    `cc' failed in phase `C pre-processor'. (Exit code: 1)
  |
1 | {-# LANGUAGE CPP #-}
  | ^
Error: cabal: Failed to build libtorch-ffi-2.0.0.0 (which is required by
hasktorch-0.2.0.0).

[kenhkan@kenitrain:~/hasktorch]$

Note: This continues a thread in Slack. It seems like there is some missing config for someone setting this up fresh. Action to follow after this thread completes is to add to the documentation and/or nix scripts.

Additional context information:

[kenhkan@kenitrain:~/hasktorch]$ git show
commit d06996e48089d72852eb5af0fdf6008a77d507b1 (HEAD -> master, origin/master, origin/HEAD)
Merge: a28facd7 53d59791
Author: Junji Hashimoto <junji.hashimoto@gmail.com>
Date:   Mon Jun 17 16:29:22 2024 +0900

    Merge pull request #711 from collinarnett/nix-shell

    Added nix devshell for cpu and cuda

[kenhkan@kenitrain:~/hasktorch]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

[kenhkan@kenitrain:~/hasktorch]$ gcc --version
gcc (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[kenhkan@kenitrain:~/hasktorch]$ nixos-version
24.05.1358.47b604b07d1e (Uakari)

[kenhkan@kenitrain:~/hasktorch]$ uname -r
5.15.160
collinarnett commented 2 weeks ago

The documentation is currently out of date on the nix side of things. I'm working on updating it now.

kenhkan commented 2 weeks ago

Ah got it. Thank you Collin, for the note and for the contributions! Do you happen to know which commit/tag I could use to get the last-good Nix version?

collinarnett commented 2 weeks ago

I actually do not know. I started working on the Nix code because it didn't work. If you just want a devshell with hasktorch in it you can use https://github.com/hasktorch/hasktorch-skeleton/pull/9. Getting the devshells working here is more for those looking to hack on hasktorch itself rather than use it in a project.

kenhkan commented 2 weeks ago

Great! This looks promising. I'll take a stab at it. Thank you Collin!

collinarnett commented 2 weeks ago

Yeah no problem. Feel free to comment here if you need any help. :smile:

kenhkan commented 2 weeks ago

@collinarnett I have two questions already. :)

Is the skeleton supposed to prepare libtorch as well? Looking through flake.nix it looks like it calls cabal2nix on each dependency so I assume libtorch needs to be prepared elsewhere and the paths be provided to cabal build, right?

Second question is not necessarily a hasktorch-skeleton question. I assumed the above is correct and called cabal with:

cabal build hasktorch --extra-include-dirs=$LIBTORCH_PATH/include --extra-lib-dirs=$LIBTORCH_PATH/lib --extra-lib-dirs=$LIBTOKENIZERS_PATH/lib --extra-include-dirs=$LIBTORCH_PATH/include/torch/csrc/api/include

where $LIBTORCH_PATH is just the directory after downloading and extracting https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.3.1%2Bcu118.zip. It spits up this weird error message:

... <redacted> ...
Configuring hasktorch-0.2.0.0...
Preprocessing library for hasktorch-0.2.0.0..
Building library for hasktorch-0.2.0.0..
<command line>: libtorch_cuda.so: ELF load command address/offset not page-aligned

I compared that file with the one I have on my Ubuntu machine and they're the same. So I assume there is something NixOS-specific that I need to do. Have you encountered this issue before?

collinarnett commented 2 weeks ago

@kenhkan

Is the skeleton supposed to prepare libtorch as well? Looking through flake.nix it looks like it calls cabal2nix on each dependency so I assume libtorch needs to be prepared elsewhere and the paths be provided to cabal build, right?

Libtorch is baked into libtorch-ffi with the following line: https://github.com/hasktorch/hasktorch/blob/d06996e48089d72852eb5af0fdf6008a77d507b1/nix/overlay.nix#L52

So when you build with cabal in hasktorch skeleton it should be as simple as:

hasktorch-skeleton on  cabal2nix [!] via λ nightly-2020-12-14 on ☁️  (us-east-1) on ☁️   
$ nix develop .             
warning: Git tree '/home/collin/projects/hasktorch-skeleton' is dirty

[collin@zombie:~/projects/hasktorch-skeleton]$ cabal build .
Warning: The package list for 'hackage.haskell.org' is 812 days old.
Run 'cabal update' to get the latest list of available packages.
Resolving dependencies...
Build profile: -w ghc-9.6.5 -O1
In order, the following will be built (use -v for more details):
 - hasktorch-skeleton-0.0.0.0 (lib) (first run)
 - hasktorch-skeleton-0.0.0.0 (exe:example) (first run)
Configuring library for hasktorch-skeleton-0.0.0.0..
Preprocessing library for hasktorch-skeleton-0.0.0.0..
Building library for hasktorch-skeleton-0.0.0.0..
[1 of 1] Compiling HasktorchSkeleton ( src/HasktorchSkeleton.hs, /home/collin/projects/hasktorch-skeleton/dist-newstyle/build/x86_64-linux/ghc-9.6.5/hasktorch-skeleton-0.0.0.0/build/HasktorchSkeleton.o, /home/collin/projects/hasktorch-skeleton/dist-newstyle/build/x86_64-linux/ghc-9.6.5/hasktorch-skeleton-0.0.0.0/build/HasktorchSkeleton.dyn_o )
Configuring executable 'example' for hasktorch-skeleton-0.0.0.0..
Preprocessing executable 'example' for hasktorch-skeleton-0.0.0.0..
Building executable 'example' for hasktorch-skeleton-0.0.0.0..
[1 of 1] Compiling Main             ( exe/Main.hs, /home/collin/projects/hasktorch-skeleton/dist-newstyle/build/x86_64-linux/ghc-9.6.5/hasktorch-skeleton-0.0.0.0/x/example/build/example/example-tmp/Main.o )

exe/Main.hs:3:1: warning: [-Wunused-imports]
    The import of ‘HasktorchSkeleton’ is redundant
      except perhaps to import instances from ‘HasktorchSkeleton’
    To import instances alone, use: import HasktorchSkeleton()
  |
3 | import HasktorchSkeleton
  | ^^^^^^^^^^^^^^^^^^^^^^^^
[2 of 2] Linking /home/collin/projects/hasktorch-skeleton/dist-newstyle/build/x86_64-linux/ghc-9.6.5/hasktorch-skeleton-0.0.0.0/x/example/build/example/example

For cuda on nixos you just need to make sure you have the nvidia drivers enabled and it should just work.

kenhkan commented 2 weeks ago

Thank you for the help Collin! I can get to the same point that you did. I still get the ELF issue with libtorch_cuda.so. Though I think CUDA support, especially using hasktorch for a project, is out of scope for what you're doing. I'm going to close this issue for now and come back to make CUDA work at some later time. For the time being, I can verify that the branch works as described!

collinarnett commented 2 weeks ago

Hey I know you closed this but making sure that CUDA is working is very important to the hasktorch project since it wouldn't be very useful otherwise :smile:. Can you give this branch a try? https://github.com/collinarnett/hasktorch/tree/cabal-nix-fixes

Executing

Just run the following:

$ nix develop .#gpu
$ cabal clean
$ cabal build hasktorch

Explanation

I added a few things to the devShell that should mean that you don't need to specify --extra-include-dirs or --extra-lib-dirs

            buildInputs = with pkgs; [
              libtorch-bin
            ];
            nativeBuildInputs = with pkgs; [
              cabal-install
            ];
            shellHook = ''
              export CPLUS_INCLUDE_PATH=${lib.getDev pkgs.libtorch-bin}/include/torch/csrc/api/include
            '';

with buildInputs it takes care of the base include and lib the extra envrionment variable here takes care of the nested include.

Also to make sure that your drivers are working on NixOS include the following in your configuration as outlined in the manual: https://nixos.org/manual/nixos/unstable/#sec-x11-graphics-cards-nvidia

{
  services.xserver.videoDrivers = [ "nvidia" ];
}

let me know if this works because I care a lot whether or not people have a good experience with Nix and hasktorch :smiley:

kenhkan commented 1 week ago

Hey Collin, thank you so much for your passion in getting hasktorch and Nix to work! I actually just reformatted my computer back to Ubuntu after failing to make it work on NixOS. I'll reinstall it on a separate partition next week and try it out! Meanwhile, I'll leave this ticket open for the time being.

kenhkan commented 1 week ago

@collinarnett I got around testing it out. I got this error on my machine:

error: flake 'git+file:///home/kenhkan/hasktorch' does not provide attribute 'devShells.x86_64-linux.gpu', 'packages.x86_64-linux.gpu', 'legacyPackages.x86_64-linux.gpu' or 'gpu'
       Did you mean cpu?

Full output with git commit:

$ git status
On branch cabal-nix-fixes
Your branch is up to date with 'origin/cabal-nix-fixes'.

nothing to commit, working tree clean

$ nix develop .#gpu
do you want to allow configuration setting 'extra-substituters' to be set to 'https://hasktorch.cachix.org' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? y
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'hasktorch.cachix.org-1:wLjNS6HuFVpmzbmv01lxwjdCOtWRD8pQVR3Zr/wVoQc=' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? y
warning: ignoring untrusted substituter 'https://hasktorch.cachix.org', you are not a trusted user.
Run `man nix.conf` for more information on the `substituters` configuration option.
error: flake 'git+file:///home/kenhkan/hasktorch' does not provide attribute 'devShells.x86_64-linux.gpu', 'packages.x86_64-linux.gpu', 'legacyPackages.x86_64-linux.gpu' or 'gpu'
       Did you mean cpu?

$ git rev-parse --short HEAD
d44347ad

Your explanation makes sense. I have videoDrivers part set in my configuration.nix.

I have my NixOS set up and ready! Please do let me know where else I can look into. I can get back to you quicker now that I have it ready to go.

collinarnett commented 1 week ago

Oh sorry. It should be cuda not gpu

kenhkan commented 6 days ago

Hi @collinarnett ! It took me some time to respond because somehow the command is stuck when fetching libtorch, and I just left it running for a bit.

$ nix develop .#cuda
warning: ignoring untrusted substituter 'https://hasktorch.cachix.org', you are not a trusted user.
Run `man nix.conf` for more information on the `substituters` configuration option.
[1/242/245 built, 603 copied (3742.6 MiB), 371.1 MiB DL] building libtorch-cxx11-abi-shared-with-deps-2.3.0-cu121.zip:                                  Dload  Upload   Total

I've tried it three times with garbage collection in between. Each time I waited for at least 3 hours, but it seems like it's stuck at this step.

sudo nix-store --verify --check-contents
sudo nix-store --verify --repair
sudo nix-collect-garbage -d

I'm not sure why it's always stuck at 242/245. Have you encountered this before?

Context info:

$ git status
On branch cabal-nix-fixes
Your branch is up to date with 'origin/cabal-nix-fixes'.

nothing to commit, working tree clean

$ git rev-parse HEAD
d44347ad8abc9e4db14fb182461f01bcba075737

$ nixos-version
24.05.2062.fc07dc3bdf29 (Uakari)