When torchdistx is build from source with an option -DTORCHDIST_SANITIZERS=tsan it hangs on following imports:
torch.testing._internal.common_distributed
torch.testing._internal.common_utils
Describe how to reproduce:
Build torchdistx from source with -DTORCHDIST_SANITIZERS=tsan
Link torchdistx build to existing PyTorch build from source
run LD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils"
It hangs
Describe the expected behavior:LD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils" should not hang
Environment:
I use PyTorch build from source, but I suspect the same issue will happen with the nightly version
AWS cluster : CPU-only environment, i.e. didn't use GPUs for commands
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
OS: Ubuntu 18.04.6 LTS
Version : 0.3.0.dev0
Additional context:
Originally, problems with tsan sanitizer were noticed, when this pull request wasn't passing CI CPU-TSAN check, i.e. it hanged there. After I've compiled torchdistx with tsan sanitizer and confirmed, that existing tests work as expected by executing:
LD_PRELOAD=/path/to/libtsan.so.0 pytest tests/python/test_deferred_init.pyLD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_fake.py
I've attempted to check my tests with:
LD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_slowmo_fsdp.py,
but unsuccessfully , i.e. it hanged as in aforementioned CI check. After, I narrowed down potential cause to the import issues.
Describe the bug:
When
torchdistx
is build from source with an option-DTORCHDIST_SANITIZERS=tsan
it hangs on following imports:torch.testing._internal.common_distributed
torch.testing._internal.common_utils
Describe how to reproduce:
torchdistx
from source with-DTORCHDIST_SANITIZERS=tsan
torchdistx
build to existing PyTorch build from sourceLD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils"
Describe the expected behavior:
LD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils"
should not hangEnvironment:
nightly
versionAdditional context: Originally, problems with
tsan
sanitizer were noticed, when this pull request wasn't passing CI CPU-TSAN check, i.e. it hanged there. After I've compiledtorchdistx
withtsan
sanitizer and confirmed, that existing tests work as expected by executing:LD_PRELOAD=/path/to/libtsan.so.0 pytest tests/python/test_deferred_init.py
LD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_fake.py
I've attempted to check my tests with:
LD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_slowmo_fsdp.py
, but unsuccessfully , i.e. it hanged as in aforementioned CI check. After, I narrowed down potential cause to theimport
issues.