satheeshxolo commented 3 years ago

🐛 Bug

While trying to run the tests under pytorch/test/ using pytest's boxing option "--forked" (to isolate some tests crashing), I see that many tests hang with --forked.

To Reproduce

python -u -m pytest test_ops.py --forked -svk test_out_cos_cpu_float32 Steps to reproduce the behavior:

Install pytest package along with the packages pytest-xdist and pytest-forked
From pytorch/test/ run: python -u -m pytest test_ops.py --forked -svk test_out_cos_cpu_float32
The test should hang unless we kill the process (On Linux 'ps' would still show the python process running after ^C)

Expected behavior

I am expecting that there is some option given to run the tests under boxed option with pytest. Tests shouldn't hang with pytest's --forked option.

Environment

Collecting environment information... PyTorch version: 1.9.0a0+git09dfd6d Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.19.6 Libc version: glibc-2.17

Python version: 3.7 (64-bit runtime) Python platform: Linux-4.15.0-145-generic-x86_64-with-debian-buster-sid Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] torch==1.0.0+767d490.dirty [pip3] torch-dataloader==1.0.0+767d490.dirty [pip3] numpy==1.21.2 [pip3] torch==1.9.0a0+git09dfd6d [conda] torch 1.0.0+767d490.dirty pypi_0 pypi [conda] torch-dataloader 1.0.0+767d490.dirty pypi_0 pypi [conda] mkl 2019.0 118
[conda] mkl-include 2019.0 118
[conda] numpy 1.21.2 pypi_0 pypi [conda] numpy-base 1.20.3 py37h39b7dee_0
[conda] torch 1.9.0a0+git09dfd6d pypi_0 pypi

Additional context

I tried a first level triaging of the issue. The problem seem to trigger from the handling of "noncontiguous" tensors in make_tensor() utility (torch/testing/_creation.py on latest, but in torch/testing/_internal/common_utils.py in 1.9.0). "noncontiguous" tensors have a conditional additional call to torch.repeat_interleave(result, 2, dim=-1). The kernel implementation of "repeat_interleave" in aten/src/ATen/native/Repeat.cpp uses at::parallel_for() which can use multiple threads in the execution. My guess is that this usage of repeat_interleave for "noncontiguous" causes the hang when used with pytest's "--forked" option.

cc @mruberry

H-Huang commented 3 years ago

We don't officially support pytest so some features for it may be missing. Can you elaborate on the use case for --forked? Isn't pytest already able to run tests in isolation?

satheeshxolo commented 3 years ago

Use-case: In my team's work we are trying to integrate pytorch for our backend hardware and so we write kernels and hook to dispatched pytorch op APIs. We run pytorch test suite in "boxed"/forked pytest environment to isolate failed test cases from passing cases. If we run tests in series (without forked), we might face "cascaded false failures" if one tests fails (because our stack is still under development). Since, I found this issue running with pytest --forked, i was thinking that there might be a way to support it atleast through an env variable option. E.g., It might have been good if there was something like:

if noncontiguous and numel > 1: if os.environ.get('PYTORCH_TEST_WITH_PYTEST_FORKED', "0FF") == "ON":

implementation that doesn't use repeat_interleave, might be less efficient

else:
    #implementation that uses torch.repeat_interleave

mruberry commented 3 years ago

It sounds like the issue is the use of multiple threads in this mode, however, which suggests that just replacing one repeat_interleave call is unlikely to have the desired effect. What about disabling parallelism and using only a single thread?

satheeshxolo commented 3 years ago

If I disable multiple threads in pytest forked (-n1, single thread), then there is no hang. But, it would be nice if make_tensor() also has a single threaded implementation that doesn't end up using at::parallel_for().

mruberry commented 3 years ago

@satheeshxolo But is make_tensor() being multithreaded the only reason the test suite doesn't work while doing this? That seems very unlikely.

satheeshxolo commented 3 years ago

@mruberry - in my observations, the noncontig handling based on torch.repeat_interleave() (in-turn based on at::parallel_for()) seems to be the point where the execution gets into some deadlock while running with pytest's --forked.

mruberry commented 3 years ago

@satheeshxolo Yes but if you fix that what breaks next?

satheeshxolo commented 3 years ago

@mruberry - i am not familiar with an alternate option that uses an op other than repeat_interleave for noncontig tensor. If there is one, please point me to that so that I can check whether that will solve the issue.

mruberry commented 3 years ago

@mruberry - i am not familiar with an alternate option that uses an op other than repeat_interleave for noncontig tensor. If there is one, please point me to that so that I can check whether that will solve the issue.

You can probably just change make_tensor to ignore the noncontiguous kwarg while debugging

pytorch / pytorch

pytorch framework tests using make_tensor hangs with pytest's boxed exec option "--forked" #65522

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

implementation that doesn't use repeat_interleave, might be less efficient