grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints
Apache License 2.0
210 stars 23 forks source link

aarch64 - No GLIBC_2.2.5 #21

Closed feuler closed 3 months ago

feuler commented 3 months ago

The ubuntu 22.04. aarch64 i use doesnt have GLIBC_2.2.5 I get this error when trying to run with LD_PRELOAD: [NVSHARE][FATAL]: libnvshare.so: undefined symbol: dlsym, version GLIBC_2.2.5

Is it possible to make it work on aarch64 ?

strings /usr/lib/aarch64-linux-gnu/libc.so.6 | grep GLIBC
GLIBC_2.17 GLIBC_2.18 GLIBC_2.22 GLIBC_2.23 GLIBC_2.24 GLIBC_2.25 GLIBC_2.26 GLIBC_2.27 GLIBC_2.28 GLIBC_2.29 GLIBC_2.30 GLIBC_2.31 GLIBC_2.32 GLIBC_2.33 GLIBC_2.34 GLIBC_2.35 GLIBC_PRIVATE GNU C Library (Ubuntu GLIBC 2.35-0ubuntu3.8) stable release version 2.35

feuler commented 3 months ago

Nevermind. Works when i use GLIBC_2.17 instead of GLIBC_2.2.5

grgalex commented 3 months ago

@feuler Glad you found a solution!

Did you amend nvshare's code or did the problem stem from your configuration?

feuler commented 3 months ago

@feuler Glad you found a solution!

Did you amend nvshare's code or did the problem stem from your configuration?

I changed some lines since my aarch64 linux default libc.so.6 doesn't have GLIBC_2.2.5 as mentioned in my first post.

Changed line 390 in src/hook.c to: r_dlsym = (dlsym_t*)dlvsym(RTLD_NEXT, "dlsym", "GLIBC_2.17");

Line 974 in src/hook.c to:
asm(".symver dlsym_225, dlsym@@GLIBC_2.17");

And all "GLIBC_2.2.5" in src/libnvshare-symbols.ld to "GLIBC_2.17"

Anyway, scheduler and LD_PRELOAD was working but failed to catch and transform all cudaMalloc calls to cudaMallocManaged (got cuda out of memory again). But in the meanwhile i was able to solve my problem by patching the used application for cudaMallocManaged.

grgalex commented 3 months ago

I think you were getting out of memory because nvshare by default limits the memory each process can allocate to the physical GPU memory size:

https://github.com/grgalex/nvshare/blob/8a15dd1094a91678bf9cdf07e5951f6b28c01cf0/src/hook.c#L663

So it might be the case that your application was trying to allocate more GPU memory than the physical GPU memory size.

You can try running your application again with the environment variable NVSHARE_ENABLE_SINGLE_OVERSUB=1 set.