Open zeroecco opened 1 month ago
that article was written a while ago
are you trying to run this locally or in the cloud? if local - there is additional work that one has to use it locally: https://github.com/nanovms/ops/pull/1528 - the older article you linked was for gcp specifically, (we have an outstanding task to document the onprem setup https://github.com/nanovms/ops-documentation/issues/430 )
* First thing I came across: I cannot build the klib on main. I hurdled this by checking out 0.1.50 of nanos
There has indeed been a recent change (in https://github.com/nanovms/nanos/pull/2011) in the nanos interrupt API, and the nvidia klib hasn't been updated yet to adapt to that change. If you want, you can check out the kernel version prior to that PR and build the klib against that. Also please note that in order to be able to build the klib you have to build nanos itself first
* Second thing: The gpu repo was updated to support nvidia driver version 535, but there are two bin files (gsp_ga10x.bin, and gsp_tu10x.bin), I copied both but not sure if that was the right choice.
Copying both is fine, the driver will pick the right one depending on which GPU type it detects
* Fourth thing and where I am currently stuck: ops bombs out immediately saying invalid GPU type. Not sure where to look from here on what I am doing wrong. Any debugging steps I should take from here?
The only "GPUType" you can set in the config when running locally is "pci-passthrough" (but you can just omit the "GPUType" option altogether, since pci-passthrough is the default setting). This will detect the GPU(s) connected to the PCI bus of your machine, and should work with any supported Nvidia GPU type.
last time I checked the build, the only change I made was:
diff --git a/kernel-open/nvidia/nv-msi.c b/kernel-open/nvidia/nv-msi.c
index 020ef53..a0c2be9 100644
--- a/kernel-open/nvidia/nv-msi.c
+++ b/kernel-open/nvidia/nv-msi.c
@@ -55,7 +55,8 @@ void NV_API_CALL nv_init_msi(nv_state_t *nv)
}
else
{
- msi_format(&address, &data, nv->interrupt_line);
+ u32 target_cpu = irq_get_target_cpu(irange(0, 0));
+ msi_format(&address, &data, nv->interrupt_line, target_cpu);
pci_cfgwrite(dev, cp + 4, 4, address); /* address low */
pci_cfgwrite(dev, cp + 8, 4, 0); /* address high */
pci_cfgwrite(dev, cp + 12, 4, data); /* data */
can't confirm that it is correct, just that it builds fine with the latest nanos.
Yes, that is a correct change. Thanks
thanks for all this feedback! I will try it and let you know ASAP
https://github.com/nanovms/gpu-nvidia/pull/5 has been merged in our gpu-nvidia repository, so the klib now builds successfully against the master branch of nanos.
closer:
root@north:~/r0uk# ops run -c ops.config main
running local instance
booting /root/.ops/images/main ...
en1: assigned 10.0.2.15
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM cpuidInfoAMD: Unrecognized AMD processor in cpuidInfoAMD
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (root@north) Mon May 13 02:04:21 AM UTC 2024
Loaded the UVM driver, major device number 0.
2024/05/15 17:50:16 Listening...on 8080
en1: assigned FE80::30A6:AEFF:FE3E:B03D
^Cqemu-system-x86_64: terminating on signal 2
signal: killed
root@north:~/r0uk#
As written in the tutorial, the line "Loaded the UVM driver, major device number 0" indicates that the GPU klib was loaded successfully, and the GPU attached to your instance is available for your application to use. Are you facing any issues?
not anymore on the nightly, thanks for your guidance
I am getting the following error (GeForce RTX 3080):
en1: assigned 10.0.2.15
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM: failed to register character device.
klib automatic load failed (4)
The above error means the klib failed to create the /dev/nvidiactl file which is used by the userspace nvidia drivers to interface with the GPU. @0x5459 is there anything already at that path in the image you are using? How are you starting the Nanos instance? If you are using Ops, can you share your command line and your json configuration file?
I suspect that the inconsistency between my CUDA version and driver version is causing the issue. My program is compiled with CUDA 11. Now, I am trying to install CUDA 12. I will reply here with any updates.
My config:
{
"RebootOnExit": true,
"ManifestPassthrough": {
"readonly_rootfs": "true"
},
"Env": {
"RUST_BACKTRACE": "1",
"RUST_LOG": "debug",
},
"Program": "c2-test",
"KlibDir": "/root/code/gpu-nvidia/kernel-open/_out/Nanos_x86_64",
"Klibs": ["gpu_nvidia"],
"Dirs": ["nvidia"],
"Mounts": {
"/root/dataset": "/dataset"
},
"RunConfig": {
"CPUs": 32,
"Memory": "64g",
"GPUs": 1
}
}
I have tried to compile my program with cuda12.2 but still get the same error. Could you give me some help? @francescolavra
The problem is in the fact that your root filesystem is being configured as read-only (via the "readonly_rootfs": "true"
option in your config). This prevents the klib from creating the /dev/nvidiactl file, and that causes the "failed to register character device" error.
@francescolavra Hi, I have a new issue.
I built deviceQuery program from cuda-samples using following config:
{
"Program": "deviceQuery",
"KlibDir": "/root/code/gpu-nvidia/kernel-open/_out/Nanos_x86_64",
"Klibs": ["gpu_nvidia"],
"Dirs": ["nvidia"],
"RunConfig": {
"GPUs": 1
}
}
But I got an error below:
$ ops instance logs test
en1: assigned 10.0.2.15
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (root@ipfs) Tue Jun 4 01:42:09 PM CST 2024
Loaded the UVM driver, major device number 0.
deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 304
-> OS call failed or operation not supported on this OS
Result = FAIL
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Could you please provide guidance on how to resolve this issue? Thank you very much for your time and help.
The above error from cudaGetDeviceCount() may be due to missing or mismatching CUDA libraries in your image. You can get some clues as to what it's failing on by enabling tracing in the kernel, i.e. adding the --trace
option to your ops run
command line. The trace output will likely show the cause of the failure.
Also, to verify that you have all CUDA libraries set up correctly in your host, you could run the deviceQuery program directly in the host (assuming you are on Linux and are using a GPU attached to your host) and see if it can query your GPU correctly.
Sorry. I have tried to enable trace, but I am still unable to determine the cause of the issue. :disappointed: trace log: https://github.com/0x5459/gpu_integration_with_nanos/blob/main/nanos_trace.log
I have created a repository to store all of my test files. Could you please provide guidance on how to resolve this issue when you are free? @francescolavra
Also, to verify that you have all CUDA libraries set up correctly in your host, you could run the deviceQuery program directly in the host (assuming you are on Linux and are using a GPU attached to your host) and see if it can query your GPU correctly.
I ran the deviceQuery program on the host. it works.
Thanks for providing details on your test environment. I see that you are using the Nvidia driver version 550.54.15; to avoid compatibility issues, you should use the same driver version as the version from which the Nanos klib is derived, which is 535.113.01 and can be downloaded at https://www.nvidia.com/download/driverResults.aspx/211711/en-us/. More specifically, the /lib/x86_64-linux-gnu/libcuda.so.1 file you put in the Nanos image should be the same as the libcuda.so.535.113.01 file you can find in the Nvidia Linux driver package.
Hello, I also failed to run the deviceQuery compiled under 535.113.01 driver and CUDA12.0
The log is as follows:
ops run deviceQuery -c config.json -n
running local instance
booting /root/.ops/images/deviceQuery ...
en1: assigned 10.0.2.15
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (circleci@02d850dae0db) Fri Jun 21 02:11:26 AM UTC 2024
Loaded the UVM driver, major device number 0.
deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
For detailed trace logs, see https://github.com/leeyiding/nanos_cuda_deviceQuery/blob/main/trace.log
@leeyiding I suggest you first try running the pre-built binaries from the CUDA demo suite (in the CUDA v12.2 toolkit you can find them in the cuda_demo_suite/extras/demo_suite/ folder), among which there is the deviceQuery program. The current version of the Nanos GPU klib has been tested successfully with the pre-built CUDA v12.2 deviceQuery binary (ensure you have the latest source of the klib, as there has been a recent fix in https://github.com/nanovms/gpu-nvidia/pull/7). Example output from deviceQuery when run on a GCP instance equiped with a Tesla T4 GPU:
en1: assigned 10.240.0.106
NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (francesco@debian) Fri 21 Jun 2024 08:04:52 PM CEST
Loaded the UVM driver, major device number 0.
device-query Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
en1: assigned FE80::4001:AFF:FEF0:6A
Detected 1 CUDA Capable device(s)
Device 0: "Tesla T4"
CUDA Driver Version / Runtime Version 12.2 / 12.2
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 14931 MBytes (15655829504 bytes)
(40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1590 MHz (1.59 GHz)
Memory Clock rate: 5001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 4
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 1, Device0 = Tesla T4
Result = PASS
@leeyiding I suggest you first try running the pre-built binaries from the CUDA demo suite (in the CUDA v12.2 toolkit you can find them in the cuda_demo_suite/extras/demo_suite/ folder), among which there is the deviceQuery program. @leeyiding我建议你先尝试运行CUDA demo suite中的预构建二进制文件(在CUDA v12.2工具包中,你可以在cuda_demo_suite/extras/demo_suite/文件夹中找到它们),其中有deviceQuery程序。 The current version of the Nanos GPU klib has been tested successfully with the pre-built CUDA v12.2 deviceQuery binary (ensure you have the latest source of the klib, as there has been a recent fix in nanovms/gpu-nvidia#7). Example output from deviceQuery when run on a GCP instance equiped with a Tesla T4 GPU: 当前版本的Nanos GPU klib已成功通过预构建的CUDA v12.2 deviceQuery二进制文件的测试(确保您拥有klib的最新源代码,因为最近在 nanovms/gpu-nvidia#7 中进行了修复)。在配备Tesla T4 GPU的GCP实例上运行时,deviceQuery的输出示例:
en1: assigned 10.240.0.106 NVRM _sysCreateOs: RM Access Sys Cap creation failed: 0x56 NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 535.113.01 Release Build (francesco@debian) Fri 21 Jun 2024 08:04:52 PM CEST Loaded the UVM driver, major device number 0. device-query Starting... CUDA Device Query (Runtime API) version (CUDART static linking) en1: assigned FE80::4001:AFF:FEF0:6A Detected 1 CUDA Capable device(s) Device 0: "Tesla T4" CUDA Driver Version / Runtime Version 12.2 / 12.2 CUDA Capability Major/Minor version number: 7.5 Total amount of global memory: 14931 MBytes (15655829504 bytes) (40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores GPU Max Clock rate: 1590 MHz (1.59 GHz) Memory Clock rate: 5001 Mhz Memory Bus Width: 256-bit L2 Cache Size: 4194304 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 3 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 4 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 1, Device0 = Tesla T4 Result = PASS
Thank you very much. Through the pre-built CUDA demo suite and the latest klibs, I have successfully run the nightly version.
Hello!
I am attempting to get a working unikernel on my workstation (going through this blog: https://nanovms.com/dev/tutorials/gpu-accelerated-computing-nanos-unikernels) but running into a number of hurdles I thought I should document and ask for assistance on:
First thing I came across: I cannot build the klib on main. I hurdled this by checking out 0.1.50 of nanos
Second thing: The gpu repo was updated to support nvidia driver version 535, but there are two bin files (gsp_ga10x.bin, and gsp_tu10x.bin), I copied both but not sure if that was the right choice.
Third thing: Following the guide, the ops config is wrong for current code.
"Klibs": ["gpu_nvidia"],
needs to be outside of the run config based on the ops docs (ops also complained about the config being wrong).Fourth thing and where I am currently stuck: ops bombs out immediately saying invalid GPU type. Not sure where to look from here on what I am doing wrong. Any debugging steps I should take from here?
Here is the current output: