hashicorp / nomad-device-nvidia

Nomad device driver for Nvidia GPU
Mozilla Public License 2.0
16 stars 7 forks source link

nvidia driver may panic on Windows WSL #2

Open notnoop opened 2 years ago

notnoop commented 2 years ago

The nvidia driver may panic when run on Windows Subsystem for Linux, WSL. The issue seems to be that the nvm C library fails to initialize but it panics when it tries to find the appropriate string failure.

The following is stack trace of the error generated from when nvidia driver was built into the nomad binary:

$ nomad agent -dev                                                                                                                                                                                                 ==> No configuration files loaded
==> Starting Nomad agent...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x11 pc=0x4066ce]

goroutine 1 [running]:
github.com/NVIDIA/gpu-monitoring-tools/bindings/go/nvml._Cfunc_GoString(...)
        _cgo_gotypes.go:170
github.com/NVIDIA/gpu-monitoring-tools/bindings/go/nvml.errorString(0xc000000011, 0xc00099f200, 0x41253b)
        github.com/NVIDIA/gpu-monitoring-tools@v0.0.0-20180829222009-86f2a9fac6c5/bindings/go/nvml/bindings.go:56 +0x54
github.com/NVIDIA/gpu-monitoring-tools/bindings/go/nvml.init_(0x150, 0x2905ce0)
        github.com/NVIDIA/gpu-monitoring-tools@v0.0.0-20180829222009-86f2a9fac6c5/bindings/go/nvml/bindings.go:65 +0x73
github.com/NVIDIA/gpu-monitoring-tools/bindings/go/nvml.Init(...)
        github.com/NVIDIA/gpu-monitoring-tools@v0.0.0-20180829222009-86f2a9fac6c5/bindings/go/nvml/nvml.go:251
github.com/hashicorp/nomad/devices/gpu/nvidia/nvml.(*nvmlDriver).Initialize(...)
        github.com/hashicorp/nomad/devices/gpu/nvidia/nvml/driver_linux.go:9
github.com/hashicorp/nomad/devices/gpu/nvidia/nvml.NewNvmlClient(0x7f49b303ba60, 0x150, 0x150)
        github.com/hashicorp/nomad/devices/gpu/nvidia/nvml/client.go:68 +0x28
github.com/hashicorp/nomad/devices/gpu/nvidia.NewNvidiaDevice(0x30b8a98, 0xc0007a4cc0, 0x30ec198, 0xc0004b4b40, 0xc00057c000)
        github.com/hashicorp/nomad/devices/gpu/nvidia/device.go:109 +0x34
github.com/hashicorp/nomad/devices/gpu/nvidia.glob..func1(0x30b8a98, 0xc0007a4cc0, 0x30ec198, 0xc0004b4b40, 0x0, 0x0)
        github.com/hashicorp/nomad/devices/gpu/nvidia/device.go:47 +0x49
github.com/hashicorp/nomad/helper/pluginutils/loader.(*PluginLoader).initInternal(0xc0004b4ba0, 0xc0004b4960, 0xc0004b4bd0, 0x0, 0x0, 0x0)
        github.com/hashicorp/nomad/helper/pluginutils/loader/init.go:96 +0x1dd
github.com/hashicorp/nomad/helper/pluginutils/loader.(*PluginLoader).init(0xc0004b4ba0, 0xc00099f6c0, 0x2, 0x2)
        github.com/hashicorp/nomad/helper/pluginutils/loader/init.go:59 +0x87
github.com/hashicorp/nomad/helper/pluginutils/loader.NewPluginLoader(0xc00099f6c0, 0x30f0b18, 0xc000950ea0, 0x30ec198)
        github.com/hashicorp/nomad/helper/pluginutils/loader/loader.go:135 +0x45d
github.com/hashicorp/nomad/command/agent.(*Agent).setupPlugins(0xc0004761e0, 0xc000319080, 0x0)
        github.com/hashicorp/nomad/command/agent/plugins.go:27 +0x15f
github.com/hashicorp/nomad/command/agent.NewAgent(0xc0003f8c00, 0x30f0b18, 0xc000950ea0, 0x3074f80, 0xc00000f9f8, 0xc0007b9f40, 0x0, 0x0, 0x256c620)
        github.com/hashicorp/nomad/command/agent/agent.go:134 +0x1fb
github.com/hashicorp/nomad/command/agent.(*Command).setupAgent(0xc0004cc380, 0xc0003f8c00, 0x30f0b18, 0xc000950ea0, 0x3074f80, 0xc00000f9f8, 0xc0007b9f40, 0x0, 0x2)
        github.com/hashicorp/nomad/command/agent/command.go:480 +0xb0
github.com/hashicorp/nomad/command/agent.(*Command).Run(0xc0004cc380, 0xc00004e1a0, 0x1, 0x1, 0x0)
        github.com/hashicorp/nomad/command/agent/command.go:672 +0x4cc
github.com/mitchellh/cli.(*CLI).Run(0xc0009123c0, 0xc0009123c0, 0xc000544cf0, 0x37)
        github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x41a
main.RunCustom(0xc00004e190, 0x2, 0x2, 0xc0000c0058)
        github.com/hashicorp/nomad/main.go:142 +0x4a7
main.Run(...)
        github.com/hashicorp/nomad/main.go:87
main.main()
        github.com/hashicorp/nomad/main.go:83 +0x65
simonbowen commented 2 years ago

I am also getting this issue, anyone got any idea what's going on here?

robmeijer commented 2 years ago

I am trying to run Nomad in WSL2 on Windows 10, and getting the same issue.

EDIT: I installed the latest v1.2.0 beta1 and now it works! https://releases.hashicorp.com/nomad/1.2.0-beta1/

simonbowen commented 2 years ago

@robmeijer Thanks for the edit. I'll try installing the beta.