guilgautier / DPPy

Python toolbox for sampling Determinantal Point Processes
https://dppy.readthedocs.io
MIT License
219 stars 53 forks source link

Split big tests into small tests? #43

Closed guillep closed 5 years ago

guillep commented 5 years ago

Right now we have a single test that tests over multiple data

    sequences = [list(range(1, 7)),
                 list(range(6, 0, -1)),
                 [3, 6, 1, 4, 5, 2],
                 [6, 2, 3, 5, 1, 4],
                 [5, 2, 6, 1, 4, 3],
                 [4, 2, 6, 5, 1, 3]]

    PQ_tableaux = [
        ([[1, 2, 3, 4, 5, 6]], [[1, 2, 3, 4, 5, 6]]),
        ([[1], [2], [3], [4], [5], [6]], [[1], [2], [3], [4], [5], [6]]),
        ([[1, 2, 5], [3, 4], [6]], [[1, 2, 5], [3, 4], [6]]),
        ([[1, 3, 4], [2, 5], [6]], [[1, 3, 4], [2, 6], [5]]),
        ([[1, 3], [2, 4], [5, 6]], [[1, 3], [2, 5], [4, 6]]),
        ([[1, 3], [2, 5], [4, 6]], [[1, 3], [2, 4], [5, 6]])]

    def test_RSK_output(self):

        for seq, tab in zip(self.sequences, self.PQ_tableaux):

            self.assertTrue(RSK(seq) == tab)

The problem with this approach is that it's more complicated to know what datapoint/combination caused the test to fail, and also, some points can mask errors in the subsequent points.

For example, in this case, imagine that the 3rd and 4th cases fail. The test will only report red with the 3rd case! So you may not be aware that the 4th case is breaking too.

Usually I refactor this into something like:

sequences = [list(range(1, 7)),
                 list(range(6, 0, -1)),
                 [3, 6, 1, 4, 5, 2],
                 [6, 2, 3, 5, 1, 4],
                 [5, 2, 6, 1, 4, 3],
                 [4, 2, 6, 5, 1, 3]]

    PQ_tableaux = [
        ([[1, 2, 3, 4, 5, 6]], [[1, 2, 3, 4, 5, 6]]),
        ([[1], [2], [3], [4], [5], [6]], [[1], [2], [3], [4], [5], [6]]),
        ([[1, 2, 5], [3, 4], [6]], [[1, 2, 5], [3, 4], [6]]),
        ([[1, 3, 4], [2, 5], [6]], [[1, 3, 4], [2, 6], [5]]),
        ([[1, 3], [2, 4], [5, 6]], [[1, 3], [2, 5], [4, 6]]),
        ([[1, 3], [2, 5], [4, 6]], [[1, 3], [2, 4], [5, 6]])]

    def test_RSK_output(self, index):
            self.assertTrue(RSK(seq[index]) == tab[index])

    def test_RSK_output1(self):
            self.test_RSK_output(self, 1)
    def test_RSK_output2(self):
            self.test_RSK_output(self, 2)
    def test_RSK_output3(self):
            self.test_RSK_output(self, 3)
    def test_RSK_output4(self):
            self.test_RSK_output(self, 4)
    def test_RSK_output5(self):
            self.test_RSK_output(self, 5)
    def test_RSK_output6(self):
            self.test_RSK_output(self, 6)

And then, this brings me to a last point: why were those examples chosen to be tested (and not others?). I see the examples were taken from a paper, but still it's a bit cryptic whether those examples are necessary or sufficient :). One thing I try to do (maybe it's not applicable here) is to name the tests with their purpose. For example, I could have renamed test_RSK_output2 into test_RSK_simple_nested_arrays.

guillep commented 5 years ago

In https://github.com/guilgautier/DPPy/blob/master/tests/test_multivariate_jacobi_ope.py https://github.com/guilgautier/DPPy/blob/master/tests/test_RSK.py

guillep commented 5 years ago

Another way to split tests is to split tests that have multiple assertions into multilple tests that have one assertion each. Again, if a test with multiple assertions fails, you don't know exactly which assertion made it fail (and this makes debugging difficult). Also, again, if the first assertion fails, it will mask all the following failing ones.

One assertion per test makes it super clear what was the cause (and if there were many).

  def test_kernel_gram_matrix(self):
            ...
            self.assertTrue(np.allclose(dpp.K(x),
                                        inner1d(phi_x)))

            self.assertTrue(np.allclose(dpp.K(X),
                                        inner1d(phi_X)))

            self.assertTrue(np.allclose(dpp.K(x, y),
                                        inner1d(phi_x, phi_y)))

            self.assertTrue(np.allclose(dpp.K(x, Y),
                                        phi_x.dot(phi_Y)))

            self.assertTrue(np.allclose(dpp.K(X, Y),
                                        inner1d(phi_X, phi_Y)))
  def test_kernel_gram_matrix1(self):
            ...
            self.assertTrue(np.allclose(dpp.K(x),
                                        inner1d(phi_x)))
  def test_kernel_gram_matrix1(self):
            ...
            self.assertTrue(np.allclose(dpp.K(X),
                                        inner1d(phi_X)))
  def test_kernel_gram_matrix1(self):
            ...
            self.assertTrue(np.allclose(dpp.K(x, y),
                                        inner1d(phi_x, phi_y)))
  def test_kernel_gram_matrix1(self):
            ...
            self.assertTrue(np.allclose(dpp.K(x, Y),
                                        phi_x.dot(phi_Y)))
  def test_kernel_gram_matrix1(self):
            ...
            self.assertTrue(np.allclose(dpp.K(X, Y),
                                        inner1d(phi_X, phi_Y)))
guilgautier commented 5 years ago

Hi @guillep,

Thanks for your help, I do agree with

Right now we have a single test that tests over multiple data The problem with this approach is that it's more complicated to know what datapoint/combination caused the test to fail, and also, some points can mask errors in the subsequent points.

One assertion per test makes it super clear what was the cause (and if there were many).

Very true!

Usually I refactor this into something like:

That's what we should do then 👍

And then, this brings me to a last point: why were those examples chosen to be tested (and not others?). I see the examples were taken from a paper, but still it's a bit cryptic whether those examples are necessary or sufficient :). One thing I try to do (maybe it's not applicable here) is to name the tests with their purpose. For example, I could have renamed test_RSK_output2 into test_RSK_simple_nested_arrays.

In DPPy, RSK is only applied to permutations, but it applies to words in general. In this case of permutation, RSK is a one-to-one correspondence between a permutation p and a pair of tableaux (P, Q) with certain properties, e.g., strictly increasing row-wise and column-wise. These examples were taken from a paper, for which I double checked the computations on paper. Obviously, one could test whether this works for other permutations with various sizes and more generally on words.