grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints
Apache License 2.0
228 stars 24 forks source link

Pytorch 2.1 GPU access not seen / managed by nvshare #11

Closed t-arsicaud-catie closed 10 months ago

t-arsicaud-catie commented 1 year ago

Hi,

I recently discovered that pytorch code such as the following :

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(5, 3).to('cuda')
criterion = nn.MSELoss().to('cuda')

inputs = torch.randn(10, 5).to('cuda')
targets = torch.randn(10, 3).to('cuda')

optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(1000):
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print('Epoch %d, Loss: %.3f' % (epoch+1, loss.item()))

which is, at execution time, registered and managed by nvshare with torch==1.13.1 and torch==2.0.1, is not with torch==2.1.

Code run as expected, accessing to the GPU defined in CUDA_VISIBLE_DEVICES, but directly, bypassing the controls made by nvshare.

My test environment is the following :

Any idea on the reason why, and is there a way to prevent this when CUDA_VISIBLE_DEVICES and LD_PRELOAD are correctly set (in nvshare or the pytorch code) ?

grgalex commented 1 year ago

@t-arsicaud-catie Can you re-run with the environment variable NVSHARE_DEBUG=1 set when launching both the client (Pytorch application) and nvshare scheduler and post the logs here?

(In other words, run with LD_PRELOAD=... NVSHARE_DEBUG=1 python3 ...)

t-arsicaud-catie commented 1 year ago

Hi,

In fact, I don't get any debug output in the pytorch app terminal, with torch==2.1.0.

And in the nvshare-scheduler terminal, only the following :

[NVSHARE][INFO]: nvshare-scheduler started in debug mode
[NVSHARE][INFO]: nvshare-scheduler listening on /var/run/nvshare/scheduler.sock

While when i run the code with torch==2.0.1, I get :

[[NVSHARE][DEBUG]: Found NVML
[NVSHARE][DEBUG]: NVSHARE_POD_NAME = none
[NVSHARE][DEBUG]: NVSHARE_POD_NAMESPACE = none
[NVSHARE][DEBUG]: Sent REGISTER
[NVSHARE][DEBUG]: Received SCHED_ON
[NVSHARE][INFO]: Successfully initialized nvshare GPU
[NVSHARE][INFO]: Client ID = 126fa39e39707cfa
[NVSHARE][DEBUG]: real_cuMemGetInfo returned free=14807.56 MiB, total=14930.56 MiB
[NVSHARE][DEBUG]: nvshare's cuMemGetInfo returning free=13394.56 MiB, total=14930.56 MiB
[NVSHARE][DEBUG]: cuMemAlloc requested 2097152 bytes
[NVSHARE][DEBUG]: cuMemAllocManaged allocated 2097152 bytes at 0x7f862a000000
[NVSHARE][DEBUG]: Total allocated memory on GPU is 2.00 MiB
[NVSHARE][DEBUG]: Received LOCK_OK
[NVSHARE][DEBUG]: cuMemAlloc requested 1024 bytes
...
...
...

...in the app terminal, and :

[NVSHARE][INFO]: nvshare-scheduler started in debug mode
[NVSHARE][INFO]: nvshare-scheduler listening on /var/run/nvshare/scheduler.sock
[NVSHARE][INFO]: Received REGISTER
[NVSHARE][INFO]: Sent SCHED_ON to client 126fa39e39707cfa
[NVSHARE][INFO]: Registered client 126fa39e39707cfa with Pod name = none, Pod namespace = none
[NVSHARE][INFO]: Received REQ_LOCK from 126fa39e39707cfa
[NVSHARE][INFO]: Sent LOCK_OK to client 126fa39e39707cfa
[NVSHARE][DEBUG]: Client 126fa39e39707cfa has closed the connection
[NVSHARE][INFO]: Removing client 126fa39e39707cfa
[NVSHARE][DEBUG]: try_schedule() called with no pending requests

...in the nvshare-scheduler terminal.

In both cases, torch==2.0.1 and torch==2.1.0, nvidia-smi shows the the app accesses to the GPU.

grgalex commented 1 year ago

This is weird.

We need to verify if the Pytorch 2.1.0 application is indeed making the CUDA calls that nvshare hooks.

Can you run gdb python3 ... for the 2.1.0 application and add breakpoints for cuInit and cuMemAlloc?

You can do this with the break cuInit and break cuMemAlloc gdb commands.

Then, paste the logs here.

t-arsicaud-catie commented 1 year ago

I am not used to using gdb but I suppose this what your asking for :

(running the script with torch==2.1.0)

with breakpoints on cuInit, cuMemAlloc and cudaMalloc :

[New Thread 0x7fff4bc39700 (LWP 24249)]
[New Thread 0x7fff4b438700 (LWP 24250)]
...
Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
Thread 1 "python" hit Breakpoint 3, 0x00007fffbc91c500 in cudaMalloc () from /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12
Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
Thread 1 "python" hit Breakpoint 3, 0x00007fffbc91c500 in cudaMalloc () from /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12
...

So no hits to /usr/local/lib/libnvshare.so.

With previous versions of torch, calls to cuInit and cudaMalloc appear in both /usr/local/lib/libnvshare.so and /lib/x86_64-linux-gnu/libcuda.so.1.

For the tests, I just switch from one virtual environment to an other, keeping the LD_PRELOAD and CUDA_VISIBLE_DEVICES environment variables.

grgalex commented 1 year ago

Good job with gdb!

However, there is a little problem.

cuMemAlloc (i.e., the Driver API function) is the function we hook in nvshare.

You mistakenly added a breakpoint for cudaMalloc (i.e., the Runtime API function which internally calls cuMemAlloc), so it is natural that we don't see a hit for libnvshare.so.

Could you rerun the test with a breakpoint for cuMemAlloc instead of cudaMalloc?

[...] With previous versions of torch, calls to cuInit and cudaMalloc appear in both /usr/local/lib/libnvshare.so

If you redo the initial test, you'll notice that cudaMalloc is not from libnvshare.so, only cuInit is.

t-arsicaud-catie commented 1 year ago

Thank you for your answer ans sorry for the inconvenience.

Here is the output of dbg with torch==2.1.0 and cuInit and cuMemAlloc breakpoints only :

(gdb) break cuInit 
Breakpoint 1 at 0x7fffba8e6660 (2 locations)
(gdb) break cuMemAlloc 
Breakpoint 2 at 0x7fffba93d8a0
(gdb) run
Starting program: /home/tarsicaud/.virtualenvs/torch2.1.0/bin/python pt1.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff4bc39700 (LWP 54157)]
[New Thread 0x7fff4b438700 (LWP 54158)]
[New Thread 0x7fff46c37700 (LWP 54159)]
[New Thread 0x7fff44436700 (LWP 54160)]
[New Thread 0x7fff43c35700 (LWP 54161)]
...
...
[New Thread 0x7ffecec07700 (LWP 54207)]
[New Thread 0x7ffecc406700 (LWP 54208)]
[New Thread 0x7ffec9c05700 (LWP 54209)]
[New Thread 0x7ffec7404700 (LWP 54210)]
[New Thread 0x7ffec4c03700 (LWP 54211)]
--Type <RET> for more, q to quit, c to continue without paging--c

Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.
[New Thread 0x7ffeb8773700 (LWP 54215)]
[New Thread 0x7ffeb6f8c700 (LWP 54216)]

Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.

Thread 1 "python" hit Breakpoint 1, 0x00007fffba8e6660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.
[New Thread 0x7ffe97fff700 (LWP 54217)]
Finished
[Thread 0x7ffe97fff700 (LWP 54217) exited]
[Thread 0x7ffed8c0b700 (LWP 54203) exited]
[Thread 0x7ffec4c03700 (LWP 54211) exited]
[Thread 0x7ffec7404700 (LWP 54210) exited]
[Thread 0x7ffec9c05700 (LWP 54209) exited]
...
...
[Thread 0x7fff43c35700 (LWP 54161) exited]
[Thread 0x7fff44436700 (LWP 54160) exited]
[Thread 0x7fff46c37700 (LWP 54159) exited]
[Thread 0x7fff4b438700 (LWP 54158) exited]
[Thread 0x7fff4bc39700 (LWP 54157) exited]
--Type <RET> for more, q to quit, c to continue without paging--c
[Inferior 1 (process 54155) exited normally]

With the same breakpoints and torch==2.0.1, I get :

gdb) break cuInit 
Breakpoint 1 at 0x7fffc138b660 (2 locations)
(gdb) break cuMemAlloc 
Breakpoint 2 at 0x7fffc13e28a0
(gdb) run
Starting program: /home/tarsicaud/.virtualenvs/torch2.0.1/bin/python pt1.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff5db35700 (LWP 53942)]
[New Thread 0x7fff5d334700 (LWP 53943)]
[New Thread 0x7fff5ab33700 (LWP 53944)]
[New Thread 0x7fff56332700 (LWP 53945)]
[New Thread 0x7fff55b31700 (LWP 53946)]
...
...
[New Thread 0x7ffee0b03700 (LWP 53992)]
[New Thread 0x7ffede302700 (LWP 53993)]
[New Thread 0x7ffedbb01700 (LWP 53994)]
[New Thread 0x7ffed9300700 (LWP 53995)]
[New Thread 0x7ffed6aff700 (LWP 53996)]
--Type <RET> for more, q to quit, c to continue without paging--c

Thread 1 "python" hit Breakpoint 1, 0x00007ffff7fc0d70 in cuInit () from /usr/local/lib/libnvshare.so
(gdb) c
Continuing.
[New Thread 0x7ffecf9c9700 (LWP 53997)]
[Switching to Thread 0x7ffecf9c9700 (LWP 53997)]

Thread 57 "python" hit Breakpoint 1, 0x00007fffc138b660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.
[New Thread 0x7ffeceec7700 (LWP 54001)]
[NVSHARE][INFO]: Successfully initialized nvshare GPU
[NVSHARE][INFO]: Client ID = 8963afc6c067d18e
[New Thread 0x7ffece6c6700 (LWP 54002)]
[Switching to Thread 0x7ffff7be6b80 (LWP 53941)]

Thread 1 "python" hit Breakpoint 1, 0x00007fffc138b660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.
[New Thread 0x7ffecdec5700 (LWP 54003)]

Thread 1 "python" hit Breakpoint 1, 0x00007ffff7fc0d70 in cuInit () from /usr/local/lib/libnvshare.so
(gdb) c
Continuing.

Thread 1 "python" hit Breakpoint 1, 0x00007fffc138b660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.

Thread 1 "python" hit Breakpoint 1, 0x00007ffff7fc0d70 in cuInit () from /usr/local/lib/libnvshare.so
(gdb) c
Continuing.

Thread 1 "python" hit Breakpoint 1, 0x00007fffc138b660 in cuInit () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) c
Continuing.
[New Thread 0x7ffec1fff700 (LWP 54009)]
Finished
[Thread 0x7ffec1fff700 (LWP 54009) exited]
[Thread 0x7fff01310700 (LWP 53979) exited]
[Thread 0x7ffef9b0d700 (LWP 53982) exited]
[Thread 0x7ffed6aff700 (LWP 53996) exited]
[Thread 0x7ffed9300700 (LWP 53995) exited]
...
...
[Thread 0x7fff55b31700 (LWP 53946) exited]
[Thread 0x7fff56332700 (LWP 53945) exited]
[Thread 0x7fff5ab33700 (LWP 53944) exited]
[Thread 0x7fff5d334700 (LWP 53943) exited]
[Thread 0x7fff5db35700 (LWP 53942) exited]
--Type <RET> for more, q to quit, c to continue without paging--c
[Inferior 1 (process 53941) exited normally]

And yes of course you are right, when I put a breakpoint on cudaMalloc in the last test, cudaMalloc only hits libcudart.so.11.0, not libnvshare.so.

grgalex commented 1 year ago

Hmm, this is strange...

Let's take a step back and verify that the dynamic linker/loader indeed links libnvshare.so into the pytorch 2.1.0 application.

Could you run LD_DEBUG=libs,symbols LD_PRELOAD=libnvshare.so python3 ... and paste the logs here?

LD_DEBUG is a special purpose environment variable that ld.so (https://man7.org/linux/man-pages/man8/ld.so.8.html) reads and prints additional debug information.

In this case we want to examine:

t-arsicaud-catie commented 1 year ago

Hi,

Thank you for your answer.

I'm quite confused as the output of LD_DEBUG=libs,symbols CUDA_VISIBLE_DEVICES=0 LD_PRELOAD=/usr/local/lib/libnvshare.so python torch_app.py &> log.txt gives a very long file (~ 900 MB).

The beginning of the log file contains :

    183761:     symbol=__vdso_clock_gettime;  lookup in file=linux-vdso.so.1 [0]
    183761:     symbol=__vdso_gettimeofday;  lookup in file=linux-vdso.so.1 [0]
    183761:     symbol=__vdso_time;  lookup in file=linux-vdso.so.1 [0]
    183761:     symbol=__vdso_getcpu;  lookup in file=linux-vdso.so.1 [0]
    183761:     symbol=__vdso_clock_getres;  lookup in file=linux-vdso.so.1 [0]
    183761:     find library=libc.so.6 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libc.so.6
    183761:
    183761:     find library=libpthread.so.0 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libpthread.so.0
    183761:
    183761:     find library=libdl.so.2 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libdl.so.2
    183761:
    183761:     find library=libutil.so.1 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libutil.so.1
    183761:
    183761:     find library=libm.so.6 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libm.so.6
    183761:
    183761:     find library=libexpat.so.1 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libexpat.so.1
    183761:
    183761:     find library=libz.so.1 [0]; searching
    183761:      search cache=/etc/ld.so.cache
    183761:       trying file=/lib/x86_64-linux-gnu/libz.so.1
    183761:
    183761:     symbol=_res;  lookup in file=python [0]
    183761:     symbol=_res;  lookup in file=/usr/local/lib/libnvshare.so [0]
    183761:     symbol=_res;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
    183761:     symbol=stderr;  lookup in file=python [0]
    183761:     symbol=error_one_per_line;  lookup in file=python [0]
    183761:     symbol=error_one_per_line;  lookup in file=/usr/local/lib/libnvshare.so [0]
    183761:     symbol=error_one_per_line;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
    183761:     symbol=__morecore;  lookup in file=python [0]
    183761:     symbol=__morecore;  lookup in file=/usr/local/lib/libnvshare.so [0]
    183761:     symbol=__morecore;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
    183761:     symbol=__key_encryptsession_pk_LOCAL;  lookup in file=python [0]
    183761:     symbol=__key_encryptsession_pk_LOCAL;  lookup in file=/usr/local/lib/libnvshare.so [0]

Is there a way to filter / extract otherwise relevant information ?

I've tried something like cat log.txt | grep 'cuInit' which gives only :

    183761: symbol=real_cuInit;  lookup in file=python [0]
    183761: symbol=real_cuInit;  lookup in file=/usr/local/lib/libnvshare.so [0]
    183761: symbol=cuInit;  lookup in file=python [0]
    183761: symbol=cuInit;  lookup in file=/usr/local/lib/libnvshare.so [0]

and cat log.txt | grep 'cuMemAlloc' :

    183761: symbol=real_cuMemAllocManaged;  lookup in file=python [0]
    183761: symbol=real_cuMemAllocManaged;  lookup in file=/usr/local/lib/libnvshare.so [0]
    183761: symbol=cuMemAlloc_v2;  lookup in file=python [0]
    183761: symbol=cuMemAlloc_v2;  lookup in file=/usr/local/lib/libnvshare.so [0]
grgalex commented 1 year ago

Hmmm, I didn't predict it would be this big. The problem is with the symbols argument to LD_DEBUG. It should have been bindings instead.

To avoid having a single, huge log file, can you split the process in two steps?

For the Pytorch 2.1.0 and 2.0.1 applications:

  1. Run just with LD_DEBUG=libs. The log should be small. We want to ensure that linbnvshare.so is loaded.
  2. Run with LD_DEBUG=bindings. The log will be bigger. We want to find out the shared library to which cuInit and cuMemAlloc are bound. You can grep for cuInit and cuMemAlloc, as you did in your previous comment.

We're getting closer!

t-arsicaud-catie commented 1 year ago

Hi,

TheLD_DEBUG=libs... python torch_app_2.1.0.py still generates a quite long log (~ 1100 lines), which I can send you by email if you wish.

Refering to libnvshare, cat libs-2.1.0.txt | grep 'nvshare' gives :

    196531: calling init: /usr/local/lib/libnvshare.so
    196531: calling fini: /usr/local/lib/libnvshare.so [0]

Also, cat libs-2.1.0.txt | grep 'libcuda'gives :

    196531: find library=libcudart.so.12 [0]; searching
    196531:   trying file=/home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcudart.so.12
    196531:   trying file=/home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcudart.so.12
    196531:   trying file=/home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib/libcudart.so.12
    196531:   trying file=/home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12
    196531: calling init: /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12
    196531: find library=libcuda.so.1 [0]; searching
    196531:   trying file=/home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcuda.so.1
    196531:   trying file=/lib/x86_64-linux-gnu/libcuda.so.1
    196531: calling init: /lib/x86_64-linux-gnu/libcuda.so.1
    196531: calling fini: /lib/x86_64-linux-gnu/libcuda.so.1 [0]
    196531: calling fini: /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 [0]

For bindings, cat bindings-2.1.0.txt | grep 'nvshare' outputs :

    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `sum_allocated'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_size_mem_allocatable'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_allocation_list'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `enable_single_oversub'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyAsync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `client_fn'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoH'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoD'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoDAsync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_check_interval'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `global_mutex'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `rsock'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `client_tid'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `own_lock'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemGetInfo'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoH_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `kern_since_sync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvml_ok'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpy'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `scheduler_on'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuLaunchKernel'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_cv'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `need_lock'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemFree_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `message_type_string'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuLaunchKernel'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyHtoDAsync_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_client_id'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuInit'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemGetInfo_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyHtoD_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `own_lock_cv'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_ctx'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetErrorString'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `__debug'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetErrorName'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlInit'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoDAsync_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxGetCurrent'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuGetProcAddress'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvscheduler_socket_path'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `initialize_client'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyAsync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `kcount_mutex'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `pending_kernel_window'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemFree'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpy'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxSetCurrent'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlDeviceGetUtilizationRates'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `did_work'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_fn'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_thread_tid'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoHAsync_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `req_lock_msg'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyHtoD'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `got_initial_sched_status'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemAllocManaged'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoHAsync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemAlloc_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuInit'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlDeviceGetHandleByIndex'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxSynchronize'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoD_v2'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetProcAddress'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyHtoDAsync'
    196669: binding file /usr/local/lib/libnvshare.so [0] to python [0]: normal symbol `stderr' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `getenv' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `free' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_create' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_sigmask' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__errno_location' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `unlink' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_broadcast' [GLIBC_2.3.2]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `clock_gettime' [GLIBC_2.17]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `write' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_wait' [GLIBC_2.3.2]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_once' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `write_whole'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fclose' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__stack_chk_fail' [GLIBC_2.4]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `accept4' [GLIBC_2.10]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `snprintf' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_driver_check_error'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `memset' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `close' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `read' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fgets' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlvsym' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `read_whole'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_wait' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `sigfillset' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_init' [GLIBC_2.3.2]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlopen' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_unlock' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `malloc' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__isoc99_sscanf' [GLIBC_2.7]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `listen' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `strlcpy'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_post' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `continue_with_lock'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `bind' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_timedwait' [GLIBC_2.3.2]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_init' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fopen' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_get_scheduler_path'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_connect'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `exit' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `connect' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fwrite' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__fprintf_chk' [GLIBC_2.3.4]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_receive_block'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `strerror' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_init' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_lock' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `rand' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlerror' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `socket' [GLIBC_2.2.5]
    196669: calling init: /usr/local/lib/libnvshare.so
    196669: binding file /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file python [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/nvtx/lib/libnvToolsExt.so.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/libc10_cuda.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /home/username/.virtualenvs/torch2.1.0/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib/libnvrtc.so.12 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /lib/x86_64-linux-gnu/libcrypto.so.1.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: binding file /lib/x86_64-linux-gnu/libnvidia-ml.so.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196669: calling fini: /usr/local/lib/libnvshare.so [0]

cat bindings-2.1.0.txt | grep 'cuInit'gives :

    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuInit'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuInit'

and cat bindings-2.1.0.txt | grep 'cuMemAlloc':

    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemAllocManaged'
    196669: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemAlloc_v2'
t-arsicaud-catie commented 1 year ago

For comparison, when doing the same tests with the torch 2.0.1 app, I get :

cat libs-2.0.1.txt | grep 'nvshare':

    196934: calling init: /usr/local/lib/libnvshare.so
[NVSHARE][INFO]: Successfully initialized nvshare GPU
    196934: calling fini: /usr/local/lib/libnvshare.so [0]

cat libs-2.0.1.txt | grep 'libcuda':


    196934: find library=libcudart.so.11.0 [0]; searching
    196934:   trying file=/home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcudart.so.11.0
    196934:   trying file=/home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcudart.so.11.0
    196934:   trying file=/home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib/libcudart.so.11.0
    196934:   trying file=/home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.11.0
    196934: calling init: /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.11.0
    196934: find library=libcuda.so.1 [0]; searching
    196934:   trying file=/home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcuda.so.1
    196934:   trying file=/lib/x86_64-linux-gnu/libcuda.so.1
    196934: calling init: /lib/x86_64-linux-gnu/libcuda.so.1
    196934: find library=libcuda.so [0]; searching
    196934:   trying file=/lib/x86_64-linux-gnu/libcuda.so
    196934: calling fini: /lib/x86_64-linux-gnu/libcuda.so.1 [0]
    196934: calling fini: /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.11.0 [0]

cat bindings-2.0.1.txt | grep 'nvshare':


    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `sum_allocated'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_size_mem_allocatable'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_allocation_list'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `enable_single_oversub'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyAsync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `client_fn'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoH'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoD'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoDAsync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_check_interval'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `global_mutex'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `rsock'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `client_tid'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `own_lock'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemGetInfo'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoH_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `kern_since_sync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvml_ok'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpy'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `scheduler_on'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuLaunchKernel'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_cv'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `need_lock'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemFree_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `message_type_string'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuLaunchKernel'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyHtoDAsync_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_client_id'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuInit'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemGetInfo_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyHtoD_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `own_lock_cv'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_ctx'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetErrorString'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `__debug'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetErrorName'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlInit'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoDAsync_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxGetCurrent'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuGetProcAddress'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvscheduler_socket_path'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `initialize_client'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyAsync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `kcount_mutex'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `pending_kernel_window'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemFree'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpy'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxSetCurrent'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlDeviceGetUtilizationRates'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `did_work'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_fn'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `release_early_thread_tid'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoHAsync_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `req_lock_msg'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyHtoD'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `got_initial_sched_status'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemAllocManaged'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyDtoHAsync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemAlloc_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuInit'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_nvmlDeviceGetHandleByIndex'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuCtxSynchronize'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemcpyDtoD_v2'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuGetProcAddress'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemcpyHtoDAsync'
    196870: binding file /usr/local/lib/libnvshare.so [0] to python [0]: normal symbol `stderr' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `getenv' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `free' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_create' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_sigmask' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__errno_location' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `unlink' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_broadcast' [GLIBC_2.3.2]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `clock_gettime' [GLIBC_2.17]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `write' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_wait' [GLIBC_2.3.2]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `pthread_once' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `write_whole'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fclose' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__stack_chk_fail' [GLIBC_2.4]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `accept4' [GLIBC_2.10]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `snprintf' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuda_driver_check_error'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `memset' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `close' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `read' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fgets' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlvsym' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `read_whole'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_wait' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `sigfillset' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_init' [GLIBC_2.3.2]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlopen' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_unlock' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `malloc' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__isoc99_sscanf' [GLIBC_2.7]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `listen' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `strlcpy'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_post' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `continue_with_lock'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `bind' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_cond_timedwait' [GLIBC_2.3.2]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libpthread.so.0 [0]: normal symbol `sem_init' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fopen' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_get_scheduler_path'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_connect'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `exit' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `connect' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fwrite' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__fprintf_chk' [GLIBC_2.3.4]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `nvshare_receive_block'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `strerror' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_init' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_mutex_lock' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `rand' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlerror' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `socket' [GLIBC_2.2.5]
    196870: calling init: /usr/local/lib/libnvshare.so
    196870: binding file /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file python [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /usr/local/lib/libnvshare.so [0] to /lib/x86_64-linux-gnu/libdl.so.2 [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/nvtx/lib/libnvToolsExt.so.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.11.0 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.11 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.10 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.11 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.11.7 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /home/username/.virtualenvs/torch2.0.1/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib/libnvrtc.so.11.2 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /lib/x86_64-linux-gnu/libcrypto.so.1.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `dlsym' [GLIBC_2.2.5]
[NVSHARE][INFO]: Successfully initialized nvshare GPU
    196870: calling fini: /usr/local/lib/libnvshare.so [0]

cat bindings-2.0.1.txt | grep 'cuInit':

    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuInit'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuInit'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuInit'

cat bindings-2.0.1.txt | grep 'cuMemAlloc':


    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `real_cuMemAllocManaged'
    196870: binding file /usr/local/lib/libnvshare.so [0] to /usr/local/lib/libnvshare.so [0]: normal symbol `cuMemAlloc_v2'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocManaged'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocPitch_v2'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocAsync'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocAsync_ptsz'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocFromPoolAsync'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocFromPoolAsync_ptsz'
    196870: binding file /lib/x86_64-linux-gnu/libcuda.so.1 [0] to /lib/x86_64-linux-gnu/libcuda.so.1 [0]: normal symbol `cuMemAllocManaged'
grgalex commented 1 year ago

Thanks for taking the time to run these tests.

I'd like to take a look at the full logs (both for libs and bindings), so if you could mail them to me, or upload them to a public place, I'll happily take a look.

grgalex commented 1 year ago

Also, I'd like you to rerun the gdb tests on 2.1.0 and 2.0.1 with the following breakpoints set:

  1. cuInit
  2. cuMemAlloc
  3. cuMemAlloc_v2
  4. cuGetProcAddress
  5. cuGetProcAddress_v2

I noticed that Pytorch 2.1.0 (from PyPI -- the one you have installed) comes with CUDA 12.x, while Pytorch 2.0.1 comes with CUDA 11.x. CUDA 12.0 introduced a new function, cuGetProcAddress_v2, which we don't hook in nvshare. We only hook the plain cuGetProcAddress from CUDA 11.

To verify this, could you uninstall Pytorch 2.1.0 and re-install it with CUDA 11.8, following the official instructions [1]? (If you are using Conda, there are also instructions for that in the same link.)

Then, rerun the Pytorch 2.1.0 example and my prediction is that it will work.

[1] https://pytorch.org/get-started/previous-versions/#linux-and-windows-1

t-arsicaud-catie commented 1 year ago

Yes, you are right !

In a cuda 12.2 environment, with torch installed with pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118, the nvshare manager is triggered as expected.

In the same cuda 12.2 environement, it is not when torch has been installed with only pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0.

I've collected the full logs which you requested for the cuda 12.2 / cuda 12.2 scenario, I'll send you them by email.

For the gdb part, the symbols which are called in this scenario are, as you expected, cuInit, cuGetProcAddress_v2 and cuMemAlloc_v2.

grgalex commented 12 months ago

Great job!

In order to support CUDA >=12 applications, we must also hook cuGetProcAddress_v2().

I will prepare (and merge) a PR tackling this when I get some time.

In the meantime, you can use the cu118 (CUDA 11.8) variant for PyTorch 2.1.0.

pokerfaceSad commented 12 months ago

@grgalex

cuGetProcAddress should serve as an entrypoint for the hook lib, implying that both initialize_libnvshare() and initialize_client() should be execute when cuGetProcAddress is hooked for the first time.

Meanwhile, the definition of cuGetProcAddress is different in CUDA 11 and CUDA 12, and some tricks may be needed to ensure compatibility.

grgalex commented 12 months ago

@pokerfaceSad

Currently, we use cuInit() as the trigger for initializing libnvshare.

According the CUDA documentation, it is the only function that applications must necessarily call before using a GPU. In the case of applications that use the CUDA Runtime API, it internally calls cuInit() for them.

cuGetProcAddress and the _v2 variant are only called in apps that use the CUDA Runtime API, as it uses these functions to obtain the Driver API symbols.

Therefore, cuInit() serves as a better entrypoint imo, and we are keeping it as such.

Regarding the differences between cuGetProcAddress and the _v2 variant, I've taken a quick look at the docs and indeed the function prototype and usage is a bit different.

Do you want to point out something specfific about the approach we should take regarding the last argument of _v2?

Perhaps you can experiment a bit with a CUDA 12.x Runtime API application and see how it uses the function.

pokerfaceSad commented 12 months ago

@grgalex When using the CUDA 11.4+ Runtime API for the first time in a user program, it will call cuGetProcAddress() to get the cuInit() and other driver API function pointers. Then cuInit() will be called by the pointer obtained previously.

It means that cuGetProcAddress() will be called before cuInit().

Therefore, cuGetProcAddress() should also serve as an entrypoint. Otherwise, real_cuGetProcAddress may be a NULL pointer when it is called.

cuGetProcAddress_v2 should also be defined in libnvshare with the additional argument for compatibility.

grgalex commented 12 months ago

@pokerfaceSad

You are right. I had missed that!

By the way, do you want to prepare and send a PR for this?

I'm kinda busy at the moment, so I would really appreciate any help!

The suggested changes are (correct me if I'm wrong):

  1. Add a hook for cuGetProcAddress_v2. No #define, as it's a distinct symbol we want to hook.
    • For the hooked functions, set symbolStatus to 0 (or define the enum in our header file and set it to CU_GET_PROC_ADDRESS_SUCCESS.
    • In other cases (unrelated symbols), the invocation of real_cuGetProcAddress_v2 will handle it.
  2. Add a true_or_exit(pthread_once(&init_libnvshare_done, initialize_libnvshare) == 0); call to both cuGetProcAddress and *_v2.
pokerfaceSad commented 11 months ago

@grgalex

OK, I will submit a PR for this :)

grgalex commented 10 months ago

@t-arsicaud-catie

We just merged support for CUDA 12.

Feel free to deploy from the main branch and hopefully it will work out of the box :)

t-arsicaud-catie commented 9 months ago

Hi,

Sorry, I was unavailable for a while and could not test until now.

It's done, and I confirm that it works well, at least with the latest versions of pytorch / cuda.

thank you both for your work and the improvement of nvshare !