guix-science / guix-science-nonfree

Non-free scientific packages for GNU Guix.
16 stars 9 forks source link

cuda-toolkit fails to download with strange error #17

Closed digash closed 4 months ago

digash commented 4 months ago
$ guix build cuda-toolkit@11 -v10
guix build: warning: ambiguous package specification `cuda-toolkit@11'
guix build: warning: choosing cuda-toolkit@11.8.0 from guix-science-nonfree/packages/cuda.scm:226:2
substitute: updating substitutes from 'https://substitutes.nonguix.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
  /gnu/store/8b5g9iq27a0vpzhzjgc1p11gklpr9a92-cuda-toolkit-11.8.0.drv
  /gnu/store/hrz73h3pm608vfqxmkizjm1v0hiv3cr2-cuda_11.8.0_520.61.05_linux.run.drv
building /gnu/store/hrz73h3pm608vfqxmkizjm1v0hiv3cr2-cuda_11.8.0_520.61.05_linux.run.drv...

Starting download of /gnu/store/sg1kpnwh07acn8n1zy86rl48fnggk5ap-cuda_11.8.0_520.61.05_linux.run
From https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run...
downloading from https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run ...
 cuda_11.8.0_520.61.05_linux.run  4.04GiB                                                     3.4MiB/s 20:23 ▕██████████████████▏ 100.0%
guix build: error: short write in copy_file_range `15' to `16': No such file or directory

Did anybody encounter this error? I am getting it consistently with cuda 12 and cuda 11. Maybe it is a file bigger than 4GB, but when I try to guix download url directly, it works but gives me a different hash and a different gnu store file.

rekado commented 4 months ago

This has happened to me, too: https://logs.guix.gnu.org/guix-hpc/2024-04-26.log#141936

I don't know what's wrong here. This is a message coming from the daemon in copyFile: https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libutil/util.cc#n382

rekado commented 4 months ago

The man page for copy_file_range says that it could return EFBIG when the range exceeds the maximum range. The code above does not check any limits and will attempt to copy the whole file.

rekado commented 4 months ago

The Nix folks have implemented the same fix with just a plain file copy: https://github.com/NixOS/nix/commit/c3878f510ec12ca6bf24505989e7463249dab61a

I believe our code ought to check the value of st.size and fall back to a boring copy if it exceeds some "reasonable" value.

rekado commented 4 months ago

We discuss this here: https://issues.guix.gnu.org/70877 I'm closing the issue here because it's not a bug in guix-science-nonfree.

Thanks for reporting it!

digash commented 4 months ago

Thank you for following through and fixing it in Guix. It works now!