pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.97k stars 22.63k forks source link

DISABLED test_comprehensive_cross_cuda_float32 (__main__.TestInductorOpInfoCUDA) #140355

Open pytorch-bot[bot] opened 3 days ago

pytorch-bot[bot] commented 3 days ago

Platforms: inductor

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 3 hours, it has been determined flaky in 4 workflow(s) with 4 failures and 4 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_comprehensive_cross_cuda_float32
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message ``` Traceback (most recent call last): File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1152, in test_wrapper return test(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1434, in only_fn return fn(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2199, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1592, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1528, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/mock.py", line 1395, in patched return func(*newargs, **newkeywargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 955, in inner raise e File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 947, in inner fn(self, device, dtype, op) File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1193, in test_comprehensive raise e File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1153, in test_comprehensive self.check_model_gpu( File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 613, in check_model_gpu check_model( File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 564, in check_model actual_grad = compute_grads(example_inputs, kwargs, actual, grads) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 351, in compute_grads return torch.autograd.grad( ^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 496, in grad result = _engine_run_backward( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 307, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1707, in backward return impl_fn() ^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1697, in impl_fn out = CompiledFunction._backward_impl(ctx, all_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2068, in _backward_impl out = call_func_at_runtime_with_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 135, in call_func_at_runtime_with_args out = normalize_as_list(f(*args)) ^^^^^^^^ TypeError: 'NoneType' object is not callable The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3057, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3057, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test result = test(self, **param_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1592, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1164, in test_wrapper raise e_tracked from e Exception: Caused by sample input at index 1: SampleInput(input=Tensor[size=(5, 3, 5), device="cuda:0", dtype=torch.float32], args=TensorList[Tensor[size=(5, 3, 5), device="cuda:0", dtype=torch.float32]], kwargs={'dim': '1'}, broadcasts_input=False, name='') To execute this test, run the following from the base repo dir: PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=1 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_cross_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ```

Test file path: inductor/test_torchinductor_opinfo.py

ConnectionTimeoutError: Connect timeout for 5000ms, GET https://raw.githubusercontent.com/pytorch/pytorch/main/test/inductor/test_torchinductor_opinfo.py -2 (connected: false, keepalive socket: false, socketHandledRequests: 1, socketHandledResponses: 0) headers: {}

cc @clee2000 @ezyang @chauhang @penguinwu

pytorch-bot[bot] commented 3 days ago
Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below: * Test name: `test_comprehensive_cross_cuda_float32 (__main__.TestInductorOpInfoCUDA)` * Platforms for which to skip the test: inductor * Disabled by `pytorch-bot[bot]` Within ~15 minutes, `test_comprehensive_cross_cuda_float32 (__main__.TestInductorOpInfoCUDA)` will be disabled in PyTorch CI for these platforms: inductor. Please verify that your test name looks correct, e.g., `test_cuda_assert_async (__main__.TestCuda)`. To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified. ``` Platforms: case-insensitive, list, of, platforms ``` We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows. ### How to re-enable a test To re-enable the test globally, close the issue. To re-enable a test for only a subset of platforms, remove the platforms from the list in the issue body. This may take some time to propagate. To re-enable a test only for a PR, put `Fixes #140355` in the PR body and rerun the test jobs. Note that if a test is flaky, it maybe be difficult to tell if the test is still flaky on the PR.
shunting314 commented 2 days ago

similar to https://github.com/pytorch/pytorch/issues/140484