Split cases data files into smaller tests and introduce more subdirectories

saroad2 commented 2 years ago

Description We've made a lot of progress improving the testing process in #3034 and #3062.

A problem that still remains is that the existing cases data files are ENORMOUS. those data files can contain up to 300 lines (and even more), which is way too much for a test. Cases files, and tests in general, should be short and concise in order for us to track down errors as easy as possible.

I would like to suggest scanning all of our cases data files and split them into small cases, each case testing only one behavior. In that way when encountering a failed test, we'll see exactly which test has failed and on which line it failed without searching over hundreds of lines of code.

Why now?

Up until now, every case data file had to be added manually to a list of cases. This is no longer the case after #3062. Now, whenever you add a test to a directory it will be added automatically to the relevant cases list.

Now we can afford having as many cases data files as we want without writing all of these tests to huge, hard-coded lists. All of our data files will be read automatically and will be processed automatically according to the directory that they're in.

How can we do it?

Very simple:

We change the all_data_cases function to search cases also in subdirectories of the given directory.
We split each directory of cases to subdirectories with the same subject. For example, we can add to the "simple cases" directory "comments" subdirectory for testing comments formats, "functions" subdirectory for testing functions formats, etc.
We move every existing case to its relevant subdirectory
If this case is too big, we split it to smaller cases in that directory.

If we keep each case in its right directory, all of the tests should keep passing without changing them.

Additional context

This is a continuation of a discussion over at #3035 of @felix-hilden and me.

saroad2 commented 2 years ago

@JelleZijlstra , @ichard26 , @cooperlees , would love to hear your opinion on this so I could start implement it.

This is a direct continuation of all the other stuff we did regarding unit testing

cooperlees commented 2 years ago

Howdy - This sounds great to me and I have no objections to lots of smaller modules and test data since the overhead of adding it to the all_data_cases allows easier scaling etc.

I'm happy to review and help get these improvements merged. I just ask lets do lots of smaller PRs moving there and I'll try do faster reviews so you don't get blocked :)

JelleZijlstra commented 2 years ago

Honestly I don't see a big problem with the current size of the data files. Having too many small files makes it harder to navigate.

felix-hilden commented 2 years ago

The size is less of a bother to me too, but I do dislike the fact that expected output can be a long scroll from the input. Another option to fix that could be to change the file format to allow multiple test blocks.

saroad2 commented 2 years ago

Hey @JelleZijlstra , thanks for the honest opinion.

There are a few reasons why we would want smaller test cases. Take a look at long_strings.py test case. It has 661 lines and total of 67 statements. When we test those statements as part of test_preview_format we are actually running 67 individual tests all at once, one for each formatted statement.

This is fine when all tests are passing, but it can be hell when this test is failing: the error prints are humongous, long and confusing, and trying to search for a single error in a test file of 661 lines is unnecessary at best.

If we manage to split this case to 67 smaller cases, one for each statement, we get a few advantages:

If one statement is failing the others will still pass, something that will help us narrow down where the exact problem is.
If we add to the error message the exact location of the case file, people won't get lost when trying to find where the error is.
When reading a test case, one won't need to run through hundreds of lines of code, she would read a test case of 10-20 lines at best. this will make all our test case more readable and understandable.
Adding and removing cases would be as easy as writting a new short file, not editing huge case files.

I hope this convinces you that splitting the test cases is the right move. If you have any caviets, I would love to hear them.

JelleZijlstra commented 2 years ago

Why would we need to read through the whole file? That sounds like it can be solved by making the test machinery output better line numbers.

I tried making a small change that breaks tests (formatting 1e6 as 1e+6). Here's the output I got:

```python _________________________ test_python_36[numeric_literals_skip_underscores] _________________________ filename = 'numeric_literals_skip_underscores' @pytest.mark.parametrize("filename", all_data_cases("py_36")) def test_python_36(filename: str) -> None: source, expected = read_data("py_36", filename) mode = black.Mode(target_versions=PY36_VERSIONS) > assert_format(source, expected, mode, minimum_version=(3, 6)) tests/test_format.py:108: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/util.py:74: in assert_format _assert_format_equal(expected, actual) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ expected = '#!/usr/bin/env python3.6\n\nx = 123456789\nx = 1_2_3_4_5_6_7\nx = 1e1\nx = 0xB1ACC\nx = 0.00_00_006\nx = 12_34_567j\nx = 0.1_2\nx = 1_2.0\n' actual = '#!/usr/bin/env python3.6\n\nx = 123456789\nx = 1_2_3_4_5_6_7\nx = 1e+1\nx = 0xB1ACC\nx = 0.00_00_006\nx = 12_34_567j\nx = 0.1_2\nx = 1_2.0\n' def _assert_format_equal(expected: str, actual: str) -> None: if actual != expected and not os.environ.get("SKIP_AST_PRINT"): bdv: DebugVisitor[Any] out("Expected tree:", fg="green") try: exp_node = black.lib2to3_parse(expected) bdv = DebugVisitor() list(bdv.visit(exp_node)) except Exception as ve: err(str(ve)) out("Actual tree:", fg="red") try: exp_node = black.lib2to3_parse(actual) bdv = DebugVisitor() list(bdv.visit(exp_node)) except Exception as ve: err(str(ve)) if actual != expected: out(diff(expected, actual, "expected", "actual")) > assert actual == expected E AssertionError tests/util.py:56: AssertionError --------------------------------------- Captured stderr call ---------------------------------------- Expected tree: file_input simple_stmt expr_stmt NAME '#!/usr/bin/env python3.6\n\n' 'x' EQUAL ' ' '=' NUMBER ' ' '123456789' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1_2_3_4_5_6_7' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1e1' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0xB1ACC' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0.00_00_006' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '12_34_567j' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0.1_2' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1_2.0' /expr_stmt NEWLINE '\n' /simple_stmt ENDMARKER '' /file_input Actual tree: file_input simple_stmt expr_stmt NAME '#!/usr/bin/env python3.6\n\n' 'x' EQUAL ' ' '=' NUMBER ' ' '123456789' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1_2_3_4_5_6_7' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1e+1' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0xB1ACC' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0.00_00_006' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '12_34_567j' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '0.1_2' /expr_stmt NEWLINE '\n' /simple_stmt simple_stmt expr_stmt NAME 'x' EQUAL ' ' '=' NUMBER ' ' '1_2.0' /expr_stmt NEWLINE '\n' /simple_stmt ENDMARKER '' /file_input --- expected +++ actual @@ -1,10 +1,10 @@ #!/usr/bin/env python3.6 x = 123456789 x = 1_2_3_4_5_6_7 -x = 1e1 +x = 1e+1 x = 0xB1ACC x = 0.00_00_006 x = 12_34_567j x = 0.1_2 x = 1_2.0 ```

To fix the test I need two things: the data filename and the approximate line where it's failing. The first is pretty easy, because we give a diff with line numbers, but the filename is harder to find because we print so much junk before it.

Here's some concrete things that would help me fix this test failure faster:

Hide the pytest stack trace, it doesn't add anything. I don't care about the code in _assert_format_equal.
Hide the AST dump, it's never helped me
Use fewer context lines around the diff, 4 is a bit much.

felix-hilden commented 2 years ago

True, having better test output would be half the battle. Maybe we could make a Pytest argument for those that want to see verbose test output like so (no experience with it though).

The other half is writing tests. It's inconvenient to scroll around the file needlessly, although in fairness it has worked well so far! So.. the inconvenience of scrolling vs. the inconvenience of making a large set of changes and having more files or more test blocks.. Honestly I'm not sure 🤷‍♂️

saroad2 commented 2 years ago

In my humble opinion, keeping these long case files is a mistake, from all the reasons I mentioned above.

In the end, it's up to you. If you choose to split the cases, I would do that happily.

psf / black

Split cases data files into smaller tests and introduce more subdirectories #3078