pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.73k stars 22.82k forks source link

DISABLED test_comprehensive_nn_functional_cosine_similarity_cuda_float16 (__main__.TestInductorOpInfoCUDA) #139725

Open pytorch-bot[bot] opened 4 weeks ago

pytorch-bot[bot] commented 4 weeks ago

Platforms: inductor

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 3 hours, it has been determined flaky in 13 workflow(s) with 13 failures and 13 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_comprehensive_nn_functional_cosine_similarity_cuda_float16
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message ``` Traceback (most recent call last): File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1152, in test_wrapper return test(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1434, in only_fn return fn(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2194, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1229, in dep_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1587, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1523, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/mock.py", line 1395, in patched return func(*newargs, **newkeywargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 955, in inner raise e File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 947, in inner fn(self, device, dtype, op) File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1193, in test_comprehensive raise e File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor_opinfo.py", line 1153, in test_comprehensive self.check_model_gpu( File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 606, in check_model_gpu check_model( File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 557, in check_model actual_grad = compute_grads(example_inputs, kwargs, actual, grads) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 344, in compute_grads return torch.autograd.grad( ^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 496, in grad result = _engine_run_backward( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 307, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1705, in backward return impl_fn() ^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1695, in impl_fn out = CompiledFunction._backward_impl(ctx, all_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2006, in _backward_impl CompiledFunction.compiled_bw = aot_config.bw_compiler( ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 51, in _wrapped_bw_compiler return disable(disable(bw_compiler)(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 721, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_utils_internal.py", line 95, in wrapper_function return function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1650, in bw_compiler return inner_compile( ^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 587, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 100, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 744, in _compile_fx_inner compiled_graph = FxGraphCache.load( ^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 1507, in load compiled_graph = compile_fx_fn( ^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 651, in codegen_and_compile compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 962, in fx_codegen_and_compile compiled_fn = graph.compile_to_fn() ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2036, in compile_to_fn return self.compile_to_module().call ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1955, in compile_to_module return self._compile_to_module() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1990, in _compile_to_module mod = PyCodeCache.load_by_key_path( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3030, in load_by_key_path mod = _reload_python_module(key, path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module exec(code, mod.__dict__, mod.__dict__) File "/tmp/tmpuq4txwx_/mo/cmojtbf6j7jseavhamvawgvixgt2gmrecmn4ubsjkk7hlaoucf7s.py", line 244, in async_compile.wait(globals()) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 312, in wait scope[key] = result.result() ^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3509, in result self.kernel.precompile() File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 274, in precompile compiled_binary, launcher = self._precompile_config( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 489, in _precompile_config binary = triton.compile(*compile_args, **compile_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/triton/python/triton/compiler/compiler.py", line 276, in compile module = src.make_ir(options, codegen_fns, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/triton/python/triton/compiler/compiler.py", line 113, in make_ir return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/inspect.py", line 1267, in getsourcelines lines, lnum = findsource(object) ^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/inspect.py", line 1096, in findsource raise OSError('could not get source code') OSError: could not get source code The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3052, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3052, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test result = test(self, **param_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1587, in wrapper fn(*args, **kwargs) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1164, in test_wrapper raise e_tracked from e Exception: Caused by sample input at index 0: SampleInput(input=Tensor[size=(5, 5), device="cuda:0", dtype=torch.float16], args=TensorList[Tensor[size=(5, 5), device="cuda:0", dtype=torch.float16]], kwargs={'dim': '1'}, broadcasts_input=False, name='') To execute this test, run the following from the base repo dir: PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=0 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_nn_functional_cosine_similarity_cuda_float16 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ```

Test file path: inductor/test_torchinductor_opinfo.py

cc @clee2000 @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov

pytorch-bot[bot] commented 4 weeks ago
Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below: * Test name: `test_comprehensive_nn_functional_cosine_similarity_cuda_float16 (__main__.TestInductorOpInfoCUDA)` * Platforms for which to skip the test: inductor * Disabled by `pytorch-bot[bot]` Within ~15 minutes, `test_comprehensive_nn_functional_cosine_similarity_cuda_float16 (__main__.TestInductorOpInfoCUDA)` will be disabled in PyTorch CI for these platforms: inductor. Please verify that your test name looks correct, e.g., `test_cuda_assert_async (__main__.TestCuda)`. To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified. ``` Platforms: case-insensitive, list, of, platforms ``` We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows. ### How to re-enable a test To re-enable the test globally, close the issue. To re-enable a test for only a subset of platforms, remove the platforms from the list in the issue body. This may take some time to propagate. To re-enable a test only for a PR, put `Fixes #139725` in the PR body and rerun the test jobs. Note that if a test is flaky, it maybe be difficult to tell if the test is still flaky on the PR.
pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.

pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.

pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.

pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.

pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.

pytorch-bot[bot] commented 4 weeks ago

Another case of trunk flakiness has been found here. The list of platforms [inductor] appears to contain all the recently affected platforms [inductor]. Either the change didn't propogate fast enough or disable bot might be broken.