pytorch / torchdistx

Torch Distributed Experimental
BSD 3-Clause "New" or "Revised" License
116 stars 31 forks source link

torchdistx compiled with `tsan` sanitizer hangs on specific imports #39

Closed aovladi closed 2 years ago

aovladi commented 2 years ago

Describe the bug:

When torchdistx is build from source with an option -DTORCHDIST_SANITIZERS=tsan it hangs on following imports:

Describe how to reproduce:

  1. Build torchdistx from source with -DTORCHDIST_SANITIZERS=tsan
  2. Link torchdistx build to existing PyTorch build from source
  3. run LD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils"
  4. It hangs

Describe the expected behavior: LD_PRELOAD=/path/to/libtsan.so.0 python -c "import torch.testing._internal.common_utils" should not hang

Environment:

Additional context: Originally, problems with tsan sanitizer were noticed, when this pull request wasn't passing CI CPU-TSAN check, i.e. it hanged there. After I've compiled torchdistx with tsan sanitizer and confirmed, that existing tests work as expected by executing: LD_PRELOAD=/path/to/libtsan.so.0 pytest tests/python/test_deferred_init.py LD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_fake.py

I've attempted to check my tests with: LD_PRELOAD=/path/to/libsan.so.0 pytest tests/python/test_slowmo_fsdp.py, but unsuccessfully , i.e. it hanged as in aforementioned CI check. After, I narrowed down potential cause to the import issues.