pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.54k stars 652 forks source link

Refactor tests into small test suits #466

Closed mthrok closed 4 years ago

mthrok commented 4 years ago

🚀 Refactor tests into small test suits.

As suggested/discussed in some PR, I would like to clean up how unit tests are organized, parameterized and ran.

Motivation

Functionalities in torchaudio have many common test patterns, such as, comparison against librosa, jit-ability and batch consistency. So if a function has nicely-organized test suite, adding the same type of function will be easier. (E.g. "I added this new SoX filter which is very similar to A, and there is a suite of test defined for A, so those are the tests I should be adding.") That way, it becomes easier for new contributor to write tests.

Let's take an example of ./test/test_functional.py module. There are a lot of test methods under TestFunctional class.

The difficulty I had with this structure when I worked on https://github.com/pytorch/audio/pull/448 is that it was not immediately clear what kind of tests are typical to add.

The current organization of test_functional.py ``` $ grep '\bclass\b\|def test' -r ./test/test_functional.py class TestFunctional(unittest.TestCase): def test_torchscript_spectrogram(self): def test_torchscript_griffinlim(self): def test_griffinlim(self): def test_batch_griffinlim(self): def test_compute_deltas_onechannel(self): def test_compute_deltas_twochannel(self): def test_compute_deltas_randn(self): def test_batch_pitch(self): def test_jit_pitch(self): def test_istft_is_inverse_of_stft1(self): def test_istft_is_inverse_of_stft2(self): def test_istft_is_inverse_of_stft3(self): def test_istft_is_inverse_of_stft4(self): def test_istft_is_inverse_of_stft5(self): def test_istft_of_ones(self): def test_istft_of_zeros(self): def test_istft_requires_overlap_windows(self): def test_istft_requires_nola(self): def test_istft_requires_non_empty(self): def test_istft_of_sine(self): def test_linearity_of_istft1(self): def test_linearity_of_istft2(self): def test_linearity_of_istft3(self): def test_linearity_of_istft4(self): def test_batch_istft(self): def test_create_fb(self): def test_gain(self): def test_dither(self): def test_vctk_transform_pipeline(self): def test_pitch(self): def test_torchscript_create_fb_matrix(self): def test_torchscript_amplitude_to_DB(self): def test_torchscript_create_dct(self): def test_torchscript_mu_law_encoding(self): def test_torchscript_mu_law_decoding(self): def test_torchscript_complex_norm(self): def test_mask_along_axis(self): def test_mask_along_axis_iid(self): def test_torchscript_gain(self): def test_torchscript_dither(self): def test_phase_vocoder(complex_specgrams, rate, hop_length): def test_complex_norm(complex_tensor, power): def test_mask_along_axis(specgram, mask_param, mask_value, axis): def test_mask_along_axis_iid(specgrams, mask_param, mask_value, axis): ```

Pitch

Continuing with the test_fucntional.py example above, we can start from breaking down test cases and compose test suites.

# This is just an illustration
Class TestSpectrogram:
    """Test suite for `spectrogram`"""

    def test_accuracy(self):
        """Produces expected results"""

    def test_jit_consistency(self):
        """is jit-able and returns consistent result"""

    def test_batch_consistency(self):
        """returns consistent results for batched input"""

    def test_comparison_against_librosa(self):
        """should yield results very close to librosa's implementation"""

Alternatives

An alternative way to break down tests is by the type of test.

class TestJit:
    """Test suite for jit-ability and consistency"""
    def test_spectrogram(self):
        """`spectrogram` should be jit-able"""

    def test_griffinlim(self):
        """`griffinlim` should be jit-able"""

class TestSoxConsistensy:
    """Test suite for sox compatibilities of filters"""
    def test_allpass(self):
        """`allpass` produces result close to SoX"""

    def test_highpass(self):
        """`highpass` produces result close to SoX"""

Pro: With this approach, it is easy to find an answer to question like, 'Which function is jit-able?' Con: Suit of tests for same function are scattered.

Related stuff

In addition to the above, when we discuss testing, we can also talk about

tomassosorio commented 4 years ago

I had the same problem, even in my first PR I missed some tests, and some transformations are missing some tests. Also when looking at how the methods work, is also difficult to understand since test are sometimes all mixed up (especially when comparing with librosa).

coverage tests would be also useful and documentation tests such as using for example pydocstyle.

Due to the extensity of tests that would be needed to be done, unit, coverage, static-analysis (mypy for example), documentation tests, style tests (flake8) it could be useful to create a makefile which would run them progressively. That's what we did in some of the repositories where I currently work and is very useful, and ensures that everyone is using the same parameters for each test.

vincentqb commented 4 years ago

I'm leaning for organizing by type of test. As mentioned by @cpuhrsch offline, some type of tests can essentially be generated automatically.

vincentqb commented 4 years ago

Also, for now, since most of our tests are based on unittest and pytorch uses unittest, I'd lean for unittest over pytest.

vincentqb commented 4 years ago

My dream would be to use black on all files, but that will lead to quite some merge conflicts :) and it'll make the code somewhat harder to git blame.

I'd consider cleaning with flake8 a good first step. By the way, fbcode complains of some flake8 violation that we should address eventually, see internal :)

cpuhrsch commented 4 years ago

As a word of caution on code formatting: git blame can be very useful in quickly figuring out why something went wrong. Therefore, I would not just apply some formatter to all files and commit it for the sake of code formatting. Instead, we could setup a system that forces formatting on new lines or at the very least new files. In general, it makes sense to set a code formatting standard through an automated tool to, at the very least, avoid discussion and spending time on the topic. It's easy to start bike-shedding whitespace.

On code coverage: It's a good way to find gaping blindspots in your test, but it's a bad way to feel good about our tests. There is also some code (for example codegen) that won't necessarily be covered and therefore will continue to have to be explicitly excluded. I use code coverage for my own personal dev, but it being part of a CI can cause confusion / cause more overhead.

vincentqb commented 4 years ago

Offline discussion with @mthrok. There are a few things that could be done:

  1. flake8
  2. black
  3. replace things like assert by torch.assert
  4. rewrite a few tests to make their intentions clearer (e.g. batching)
  5. switch all to pytest, or all to unittest
  6. move tests

About flake8, there are a small number of minor errors reported by fbcode. These should be fixed, but otherwise this should be a small change.

About black, as much as I love formatting code with it, this would mess up git blame too much only for code formatting, so I would be against.

About pytest vs unittest,

About moving tests,

All that being said, @mthrok did specify that this particular issue is only for the last step, "move tests" without modifications. Thoughts?

mthrok commented 4 years ago

@vincentqb I created PR #480, which re-organize tests based on category in a small scope (only test_functional module). Once we can agree on pros-and-cons of the re-org, I will proceed with the rest of the test module.

Regarding, git blame, it is true that the information of the original author will be lost from the latest commit, but if the intention of test is correctly documented (either as descriptive test function name, or docstring), then I think it's overall more beneficial to make test more readable and intuitively comprehensive while the whole code base is somewhat small.