Closed geekodour closed 1 month ago
current workaround:
doCheck = false;
EDIT: This did not solve the issue. It simply skipped the test but now the binary that's built throws the following:
./cmd: symbol lookup error: ./cmd: undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV
I spent sometime trying to figure out what's wrong as I was using a fairly straightforward buildGoModule
.
After a while, I decided to skip solving for this and thought vendoring the dependencies will simplify this issue a bit so I created a fork (https://github.com/geekodour/nomad-device-nvidia) with the dependencies vendor'ed (go mod tidy, go mod vendor
).
I could verify the existence of vendor/github.com/NVIDIA/go-nvml/pkg/nvml/
in my fork, which is referred here: https://github.com/hashicorp/nomad-device-nvidia/blob/66fe3a14e471f4844dffa13ada3c6fdadcd98ab7/nvml/driver_linux.go#L11
Now I do a nix build
and I get:
λ nix build --impure --no-link --print-out-paths .#nomad-device-nvidia
path '/home/geekodour/x/newnixsetup/pkgs' does not contain a 'flake.nix', searching up
warning: Git tree '/home/geekodour/x' is dirty
error: builder for '/nix/store/yi0xqzsydd9xs4a1j0l7s4vi85wdv77k-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv' failed with exit code 1;
last 9 log lines:
> Running phase: unpackPhase
> unpacking source archive /nix/store/32hmlbaxdcspv9qq1rk4v1i60h4lws9y-source
> source root is source
> Running phase: patchPhase
> Running phase: updateAutotoolsGnuConfigScriptsPhase
> Running phase: configurePhase
> Running phase: buildPhase
> Building subPackage ./cmd
> nvml/driver_linux.go:11:2: cannot find module providing package github.com/NVIDIA/go-nvml/pkg/nvml: import lookup disabled by -mod=vendor
For full logs, run 'nix log /nix/store/yi0xqzsydd9xs4a1j0l7s4vi85wdv77k-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv'.
This does not make any sense. I tried digging into more issues, found one related issue where problems were caused by the use of uppercase letters: https://github.com/NixOS/nixpkgs/issues/273998#issuecomment-1936601932
at this point i am clueless, so I am re-opening the issue even if its not directly related to nomad-device-nvidia(the makefile commands directly are working absolutely fine) but more of a nix issue at this point or me messing something up.
Full error(when using vendored mod):
warning: The interpretation of store paths arguments ending in `.drv` recently changed. If this command is now failing try again with '/nix/store/qlm42n6c6wl514fg0bdfdl1f022axlrg-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv^*'
Sourcing auto-add-driver-runpath-hook
Using autoAddDriverRunpath
Sourcing fix-elf-files.sh
@nix { "action": "setPhase", "phase": "unpackPhase" }
Running phase: unpackPhase
unpacking source archive /nix/store/32hmlbaxdcspv9qq1rk4v1i60h4lws9y-source
source root is source
@nix { "action": "setPhase", "phase": "patchPhase" }
Running phase: patchPhase
@nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" }
Running phase: updateAutotoolsGnuConfigScriptsPhase
@nix { "action": "setPhase", "phase": "configurePhase" }
Running phase: configurePhase
@nix { "action": "setPhase", "phase": "buildPhase" }
Running phase: buildPhase
Building subPackage ./cmd
nvml/driver_linux.go:11:2: cannot find module providing package github.com/NVIDIA/go-nvml/pkg/nvml: import lookup disabled by -mod=vendor
(Go version in go.mod is at least 1.14 and vendor directory exists.)
Reproducible example:
# see https://github.com/NixOS/nixpkgs/pull/304108
# see https://github.com/hashicorp/nomad-device-nvidia
# see https://github.com/geekodour/nomad-device-nvidia
{ lib, pkgs, buildGoModule, fetchFromGitHub }:
buildGoModule rec {
pname = "nomad-device-nvidia";
version = "8598e31a0a38a9ed5e14451cf86ab8a8211ab98b"; # Jul 27, 2024
#nativeBuildInputs = [ pkgs.autoAddDriverRunpath ];
CGO_ENABLED = 1;
# GOOS = "linux";
# GOARCH = "amd64";
# doCheck = true;
# doInstallCheck = false;
# runVend = true;
proxyVendor = true;
# deleteVendor = true;
src = fetchFromGitHub {
owner = "geekodour";
repo = pname;
rev = "${version}";
sha256 = "sha256-urASq/T4XcDVUp03bCKqvojCjLrGb+l47JbZWsHbSGg=";
# sha256 = lib.fakeHash;
};
vendorHash = null;
subPackages = [ "cmd" ];
# subPackages = [ "." ];
meta = with lib; {
homepage = "https://github.com/hashicorp/nomad-device-nvidia";
description = "Nomad device plugin for Nvidia GPUs";
mainProgram = "nomad-device-nvidia";
platforms = platforms.linux;
license = licenses.mpl20;
maintainers = with maintainers; [ geekodour ];
};
}
I adopted a very rough workaround for now, have one directory in my homedir where I have the compiled binary and using it from a overlay package:
{ lib, pkgs, stdenv, fetchFromGitHub }:
stdenv.mkDerivation rec {
name = "nomad-device-nvidia";
src = /home/geekodour/infra/nomad-plugins/nomad-device-nvidia;
nativeBuildInputs = [pkgs.autoAddDriverRunpath];
buildPhase = "";
dontUnpack = true;
doCheck = false;
installPhase = ''
#cp -r $src $out
mkdir -p $out/bin
cp $src/nomad-device-nvidia $out/bin
'';
meta = with lib; {
homepage = "https://github.com/hashicorp/nomad-device-nvidia";
description = "Nomad device plugin for Nvidia GPUs";
mainProgram = "nomad-device-nvidia";
platforms = platforms.linux;
license = licenses.mpl20;
maintainers = with maintainers; [ geekodour ];
};
}
Hi @geekodour glad you got it working. While we can appreciate NixOS we don't have the expertise to help support it; the driver builds and packages well on the [mainstream] distros we support customers with.
I was trying to build it for my local nix packages,
But then I get the error:
I am trying to get this to work at the moment, will post updates. Let me know if any suggestions around what should fix this.
I think https://github.com/hashicorp/nomad-device-nvidia/issues/34 might be related.