kevmo314 / scuda

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Apache License 2.0
536 stars 15 forks source link

Where is client.c? #32

Open 88plug opened 2 weeks ago

88plug commented 2 weeks ago

Trying to compile the Dockerfile from source and stuck on

/opt/cuda/bin/nvcc -shared -o libscuda.so client.c

Where is client.c?

88plug commented 2 weeks ago

./local.sh build created it - however, this wasn't clear in the instructions.

I was able to build successfully using Ubuntu 24 -

88plug commented 2 weeks ago

Still having issues since there are no version requirements or specifics - I have cuda/nvcc installed but get

Running test(s)... ✗ CUDA is not available. Expected True but got [cuGetProcAddress: Mapped symbol 'cuGraphExecGetFlags' to function: 0x7fff1fafb778]. ✗ Tensor failed. Got [cuGetProcAddress: Mapped symbol 'cuGraphExecGetFlags' to function: 0x7fff13ad8dd8]. ✗ Tensor failed. Got [cuGetProcAddress: Mapped symbol 'cuGraphExecGetFlags' to function: 0x7ffc823b0ff8].

OS: Ubuntu 24.04.1 LTS x86_64
Kernel: 6.8.0-45-generic
Terminal: /dev/pts/0
CPU: AMD EPYC-Rome (256) @ 2.249GHz
GPU: NVIDIA RTX 4000 Ada Generation
Memory: 6647MiB / 108602MiB
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

Love this project and trying to get a Dockerfile.client and Dockerfile.server working!

kevmo314 commented 2 weeks ago

Thanks! Apologies for the light documentation and I appreciate you still trying :)

Which GPU model are you using? We have been testing with a 4090, wonder if there are differences...

88plug commented 1 week ago
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:01:00.0 Off |                  Off |
| 30%   39C    P8             14W /  130W |       2MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:02:00.0 Off |                  Off |
| 30%   37C    P8             12W /  130W |       2MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Have you tested or thought through multiple GPU?

kevmo314 commented 1 week ago

The code today does work with multiple GPUs on the same host. There is a plan for supporting multiple GPUs across separate hosts but progress hasn't started on that at all. What happens if you run something like:

./local.sh build && SCUDA_SERVER=127.0.0.1 LD_PRELOAD=$(pwd)/libscuda.so nvidia-smi

replacing 127.0.0.1 with the IP of your ./local.sh server instance?