Shadow memory range interleaves with an existing memory mapping

jason-infra commented 1 year ago

I have an ASAN test suite that gives me the following error:

==25==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==25==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==25==This might be related to ELF_ET_DYN_BASE change in Linux 4.12.
==25==See https://github.com/google/sanitizers/issues/856 for possible workarounds.
==25==Process memory map follows:
    0x00007fff7000-0x00008fff7000   
    0x000091ff6000-0x004091ff7000   
    0x02008fff7000-0x10007fff8000   
         ...

I am running

Nvidia 525.60.13 drivers
Ubuntu 20.04.5
Linux kernel 5.15.0-1027-gcp

IMPORTANTLY, All Asan test pass with no errors if I run the ASAN tests using nvidia 470 drivers:

Nvidia 470.82.01 drivers
Ubuntu 20.04.5
Linux kernel 5.15.0-1027-gcp

Other Info

I have kept all variables the same, including Asan testsuite, OS, kernel, nvidia gpu t4. The only difference being the Nvidia drivers.
I do not believe -fsanitize=address is related in #856, because the test suite runs flawlessly with Nvidia 470 drivers.
All my other tests pass, including tsan.

Why am I still getting the "Shadow memory range interleaves" error?

mjj48 commented 1 year ago

Standalone command to repro this:

#!/bin/bash
set -e 

# Things you must change:
LIB_ASAN=/usr/lib/x86_64-linux-gnu/libasan.so.5
LIB_CUDART=cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
NVCC_PATH=/home/michael.johnson/cuda-11.2/bin/nvcc
CLANG10_PATH=/home/michael.johnson/clang/bin/clang
# End of things you must change.

# Create .h
cat <<EOT >> hello.h
void call_hello();
EOT
# Create .cu
cat <<EOT >> hello.cu
#include <cstdio>
__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}
void call_hello() {
    cuda_hello<<<1,1>>>(); 
}
EOT
# Create .cpp
cat <<EOT >> hello_bin.cpp
#include "hello.h"
int main() {
    call_hello(); 
    return 1;
}
EOT
mkdir -p needed_libs/
cp $LIB_ASAN needed_libs/
cp $LIB_CUDART needed_libs/

$NVCC_PATH  --objdir-as-tempdir  --compiler-options "-fPIC" --compiler-bindir=$CLANG10_PATH  -x cu  -O2 -c hello.cu -o hello.pic.o

$CLANG10_PATH -shared -o libhello.so hello.pic.o -fsanitize=address -stdlib=libstdc++ -Lneeded_libs -lstdc++

cp libhello.so needed_libs/

$CLANG10_PATH  -fPIC  -nostdinc '-std=c++17' -nostdinc++ -c hello_bin.cpp -o hello_bin.pic.o

$CLANG10_PATH -o hello_bin -Lneeded_libs hello_bin.pic.o -lhello -l:libcudart.so.11.0 -l:libstdc++.so.6 -pie -fsanitize=address -fuse-ld=gold  -stdlib=libstdc++ -lstdc++

LD_LIBRARY_PATH=needed_libs/ ./hello_bin 2>&1 | head

noaxp commented 8 months ago

It seems the problem only happens when you use both clang and gold ld. I also encountered this issue on gcc, but it's weird I couldn't reproduce it now . As a temporary solution, you could try gcc or other ld.

Reproduce: clang demo.cc -fsanitize=address -I/usr/local/cuda/include -lcuda -o demo -fuse-ld=gold Remove -fuse-ld=gold the program would work well.

// demo.cc
#include <cuda.h>
int main() {
  cuInit(0);
  return 0;
}

noaxp commented 8 months ago

It's caused by duplicated invoking of InitializeShadowMemory. First invoking is before main(), second is before cuInit(0).

The memory address of variable asan_inited is different between the twice invoking, so the program consider asan uninitialized and call InitializeShadowMemory in the second time. Then it try to allocate shadow memory on the same address i.e. 0x00007fff7000-0x10007fff7fff and cause error.

But I'm still confused why there are two asan_inited object, and why cuda driver & ld could effect it.

noaxp commented 8 months ago

I found the root cause, there is below code in libcuda.so

Dl_info attr[2];
dladdr((void*)&pthread_join, attr);
dlopen(attr[0].dli_fname, 1);

So the program try to dlopen itself, and dlopen pie file is undefined behavior.

google / sanitizers

Shadow memory range interleaves with an existing memory mapping #1630

Other Info