pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.76k stars 22.59k forks source link

Catalogue of flaky tests under `test/dynamo` #112678

Open Chillee opened 1 year ago

Chillee commented 1 year ago

🐛 Describe the bug

  1. predispatch + export + out_dtype fails when running twice. Somewhat mysterious, on second invocation, make_fx is baking in a constant as a fake tensor. pytest test_export.py test_export.py -k test_predispatch_with_for_out_dtype. They are run twice due to test_dynamic

    test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype - torch._subclasses.fake_tensor.DataDependentOutputException: aten.allclose.default
    test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype_nested - torch._subclasses.fake_tensor.DataDependentOutputException: aten.allclose.default
  2. library registration fails when done twice pytest test_misc.py test_misc.py -k test_non_pt2_compliant_ops_graph_break

    test/dynamo/test_misc.py::MiscTests::test_non_pt2_compliant_ops_graph_break - AttributeError: '_OpNamespace' object has no attribute 'bar2'
  3. Not sure for this failure yet, but repro is pytest test_logging.py test_recompile_ux.py.

    test/dynamo/test_recompile_ux.py::RecompileUxTests::test_drop_cache_on_skip - AssertionError: False is not true
  4. This test also fails if test_logging is run beforehand pytest test_logging.py test_repros.py

    test/dynamo/test_repros.py::ReproTests::test_optim_state_references_cleared - AssertionError: tensor([[0.1000, 0.1000, 0.1000,  ..., 0.1000, 0.1000, 0.1000],
  5. This test fails if test_modules is run beforehand pytest test_modules.py test_repros.py

    test/dynamo/test_repros.py::ReproTests::test_reformer_train - AssertionError: '3' != '1'

Non-flaky failing local tests (i.e. fails when run individually)

 test/dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_numpy_random_dynamic_shapes - torch._dynamo.exc.Unsupported: call_function UserDefinedObjectVariable(randn) [ConstantVariable(int), ConstantVariable(int)] {}
 test/dynamo/test_functions.py::FunctionTests::test_numpy_random - torch._dynamo.exc.Unsupported: call_function UserDefinedObjectVariable(randn) [ConstantVariable(int), ConstantVariable(int)] {}
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_fn_call_args - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_local_module - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_nonlocal_fn_call - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_nonlocal_module_class - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_nonlocal_module_fn_call - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_successful_inline - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo
 test/dynamo/test_replay_record.py::ReplayRecordTests::test_unsuccessful_inline - AssertionError: no logs of level ERROR or higher triggered on torch._dynamo

Uncatalogued failing local tests

 test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance - KeyError: '140089192996896\n\nTo execute this test, run the following from the base repo dir:\n     python test/dynamo/test_repros.py -k test_graph_break_on_ji...
 test/dynamo/test_repros.py::ReproTests::test_sigmoid_out - KeyError: '140089192996896\n\nTo execute this test, run the following from the base repo dir:\n     python test/dynamo/test_repros.py -k test_sigmoid_out\n\nTh...
 test/dynamo/test_repros.py::ReproTests::test_sort_out - KeyError: '140089192996896\n\nTo execute this test, run the following from the base repo dir:\n     python test/dynamo/test_repros.py -k test_sort_out\n\nThis ...
 test/dynamo/test_repros.py::ReproTests::test_tokenization - KeyError: '140089192996896\n\nTo execute this test, run the following from the base repo dir:\n     python test/dynamo/test_repros.py -k test_tokenization\n\nT...
 test/dynamo/test_subclasses.py::SubclassTests::test_recompile_with_symbool_inputs - AssertionError
 test/dynamo/test_trace_rules.py::TraceRuleTests::test_torch_name_rule_map - AssertionError: False is not true : New torch objects: {<class 'torch._decomp.decompositions_for_rng.PhiloxStateTracker'>, <class 'torch.utils.tensorboard.writ...

cc: @voznesenskym @zou3519 @ezyang

Versions

N/A

cc @ezyang @msaroufim @bdhirsh @anijain2305 @zou3519 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @wconstab

jon-chuang commented 1 year ago

Activation checkpointing tests should be fixed by: https://github.com/pytorch/pytorch/pull/111139

Fixes at least these tests (others may have been skipped on my local):

FAILED [0.0709s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_decomps - torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
FAILED [0.0567s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_dropout - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
FAILED [0.0302s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_function - torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
FAILED [0.0294s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_function_via_global_checkpoint - torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
FAILED [0.0591s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_function_with_kwargs - AssertionError: In graph GraphModule()
FAILED [0.0445s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_module - torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
FAILED [0.0499s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_multiple_checkpoints - torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
FAILED [0.0343s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_rand - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
FAILED [0.0615s] test/dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTests::test_tags_recomputed_rand - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
yf225 commented 11 months ago

https://github.com/pytorch/pytorch/pull/112672 should fix the activation checkpointing tests

yf225 commented 11 months ago

Activation checkpointing tests are fixed by https://github.com/pytorch/pytorch/pull/112672. Q: how to re-enable these tests? cc. @Chillee

Chillee commented 11 months ago

Just close all the relevant issues: https://github.com/pytorch/pytorch/issues?q=is%3Aissue+is%3Aopen+ActivationCheckpointingViaTagsTests+

Mrwhite132613 commented 11 months ago

is this testcases of test_replay_record.py deprecated?

zou3519 commented 10 months ago

Interestingly, many of these cases pass under python unittest, but fail with pytest. This makes the debugging experience more difficult.