Make stats doctests ready for random seeds

kliem commented 4 years ago

Part of #29935.

This ticket makes

sage -t --long --random-seed=n src/sage/stats/

pass for n more general than just 0.

CC: @slel

Component: doctest framework

Author: Jonathan Kliem

Branch/Commit: b0c02b3

Reviewer: Samuel Lelièvre, Markus Wageringel

Issue created by migration from https://trac.sagemath.org/ticket/29972

kliem commented 4 years ago

comment:1

At least the following need fixing:

sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 7 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_lattice.py  # 9 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_polynomial.py  # 6 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/chmm.pyx  # 8 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/distributions.pyx  # 3 doctests failed

mkoeppe commented 3 years ago

comment:3

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

kliem commented 3 years ago

Branch: public/29972

kliem commented 3 years ago

Commit: d96e5c3

kliem commented 3 years ago

New commits:

d96e5c3 make stats ready for random seeds

kliem commented 3 years ago

Changed dependencies from #29962 to none

kliem commented 3 years ago

Author: Jonathan Kliem

mwageringel commented 3 years ago

comment:5

I have tried 6 different seeds and got one failure:

sage -t --long --warn-long 134.7 --random-seed=2002 src/sage/stats/distributions/discrete_gaussian_lattice.py
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_lattice.py", line 210, in sage.stats.distributions.discrete_gaussian_lattice.DiscreteGaussianDistributionLatticeSampler._normalisation_factor_zz
Failed example:
    l.count(v)  # abs tol 20
Expected:
    64
Got:
    87
Tolerance exceeded:
    64 vs 87, tolerance 3e1 > 2e1

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a15025d more tolerance

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from d96e5c3 to a15025d

kliem commented 3 years ago

comment:7

Ok, I made it a bit more flexible.

These doctests are really difficult to fix. As you noticed, it is far from stable.

slel commented 3 years ago

comment:8

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0) in front of some tests.

Is there an explanation somewhere of when to do that? I did not find one at #29935.

slel commented 3 years ago

Description changed:

--- 
+++ 
@@ -1,7 +1,9 @@
+Part of #29935.
+
 This ticket makes

sage -t --long --random-seed=n src/sage/stats/

-pass for different values `n` than just `0`.
+pass for `n` more general than just `0`.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`fb3fab0`	`some changes suggested by reviewer`
`f06cc7c`	`remove set_random_seed(0)`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from a15025d to f06cc7c

kliem commented 3 years ago

comment:10

Replying to @slel:

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0) in front of some tests.

Is there an explanation somewhere of when to do that? I did not find one at #29935.

Thanks for the above suggestions.

Yes, we agreed to avoid set_random_seed(0) if at all possible. It is harder to fix this for doctests that you don't understand. I removed this now.

mwageringel commented 3 years ago

comment:11

This still fails too frequently (8 times out of 50). I understand the difficulty in making this work and I do not have a good solution either. Personally, I have the impression that we should not write tests that are guaranteed to fail sporadically at all, especially considering the vast number of doctests in Sage. Every false positive adds noise to the development process, which can make real issues go unnoticed. Besides that, including a test in the documentation making false claims about a probability distribution is a bit confusing.

Setting the seed to 0 may be the only reliable solution, but I welcome other ideas and opinions. Or we just increase the tolerances enough for the tests to pass (almost) always.

When I was running a patchbot client, I fixed several flaky doctests just to get the patchbot to report usable results, which is a time consuming process. That is why I am wary about introducing new such issues.

sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13016 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13032 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13034 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13042 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13047 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13049 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed

slel commented 3 years ago

comment:12

I agree. See a similar discussion at

Stack Overflow question 641318: Test probabilistic functions

kliem commented 3 years ago

comment:13

Yes, 8 out of 50 is definitely too much.

My idea was to have meaningful tests that fail about 1 in 10000 or less. That is how I did the tests in #29976. Run the corresponding function a couple thousand times and take the maximum and minimum.

E.g. there is a doctest in matrices that tests that random GF(2) 1000 x 1000 matrices have high rank (because that was an issue that has been fixed at some point). Now theoretically, it could still have a very low rank, but I don't think this will happen. The new doctest is rather meaningful, I think, because it documents that such a matrix will very probably (at least 99.9%) have rank 995 or higher. Also the doctest now is actually being run. It doesn't just tell us that for random_seed(0) things work out, but that things usually work out.

kliem commented 3 years ago

comment:14

Here is an example of something I feel comfortable with:

sage: def foo(): 
....:     A = random_matrix(GF(2), 1000, 1000) 
....:     return float(A.density()) 
....:                                                                                                                                                                               
sage: max(foo() for _ in range(100))                                                                                                                                                
0.50165
sage: max(foo() for _ in range(1000))                                                                                                                                               
0.501586

Then I decided that a tolerance of 0.01 to make it almost never fail. Of course there should

             sage: A = matrix(GF(2), 5, 5, 0)
-            sage: A.randomize(0.5); A
-            [0 0 0 1 1]
-            [0 1 0 0 1]
-            [1 0 0 0 0]
-            [0 1 0 0 0]
-            [0 0 0 1 0]
-            sage: A.randomize(); A
-            [0 0 1 1 0]
-            [1 1 0 0 1]
-            [1 1 1 1 0]
-            [1 1 1 1 1]
-            [0 0 1 1 0]
+            sage: len(A.nonzero_positions())
+            0
+            sage: A.randomize(0.5)
+            sage: 0 <= len(A.nonzero_positions()) < 12
+            True
+            sage: A.randomize()
+            sage: 1 <= len(A.nonzero_positions()) < 24
+            True

Here the last one fails apparently something like 1 in a million. Lets say there are thousand tests like this in sage and it takes 2 hours for a patch bot to run all tests. Then apparently every 83 days your patchbot will report such an error. Is this ok? Of course it will only die, if this happens after a beta. This will happen something like every twenty years.

Maybe tests failing 1 in a 100 000 are still acceptable. But everything significantly below that would be annoying. You don't want to restart your patchbot every 50 days, because of stupid failures.

And of course, if the test becomes meaningless by that kind of tolerance, we might want to go with set_random_seed(0).

kliem commented 3 years ago

comment:15

Maybe the far more easy and reasonable solution is to have the patchbots run at random seed 0, when they verify that they are not broken. I mean this is our reference seed. We can life with bots reporting once in a while incorrect failures (it happens anyway).

So we could at this to the patchbot scripts?

kliem commented 3 years ago

comment:16

Sorry for all my comments.

Just forget what I said earlier. I still think that tests with set_random_seed(0) are pretty worthless, because they don't test anything.

However, doctests should be mathematically correct. So rethinking:

             sage: A = matrix(GF(2), 5, 5, 0)
             sage: A.randomize()
             sage: 1 <= len(A.nonzero_positions()) < 24
             True

is NOT a good test. Not because it is likely to fail, but because it boils down to something as:

sage: randint(0, 2^24) != 0
True

It probably won't fail, but it is terrible.

There is two reasons for doctests that I think should be kept in mind

educational/documentation
actually testing that things work (e.g. that random really does something).

Here is my proposition:

I think the way to combine those things is using while statements in those tests for "randomness". Those tests succeed almost certainly in a mathematical correct way:

Testing that not all random elements are zero

sage: while GF(3).random_element().is_zero(): pass

Testing that all (of finitely many) elements can be obtained eventually.
Testing that random matrices over GF(2) have different ranks:

sage: def random_rank():
....:     A = matrix(GF(2), 5, 5, 0) 
....:     A.randomize() 
....:     return A.rank()
sage: while random_rank() < 5: pass
while while random_rank() >= 4: pass

Testing that according to the law of large numbers an expectation is approximated at some point.

sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)                                                                                                  
sage: def throw_dice(): counter[randint(1, 6)] += 1                                                                                   
sage: throw_dice()                                                                                                                    
sage: def test(): return all((counter[i]*1.0/sum(counter.values()) - 1/6) < 0.01 for i in counter)                                    
sage: while not test(): inc()

Applying this to our situation this could look like:

sage: from sage.stats.distributions.discrete_gaussian_integer import DiscreteGaussianDistributionIntegerSampler   
sage: sigma = 3.0                                                                                                                     
sage: D = DiscreteGaussianDistributionIntegerSampler(sigma=sigma)                                                                     
sage: bound = (6*sigma).floor()                                                                                                       
sage: norm_factor = sum([exp(-x^2/(2*sigma^2)) for x in range(-bound,bound+1)]) 

sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)  
sage: n = 0                                                                                                 
sage: def add_sample(): 
....:     global counter, n 
....:     counter[D()] += 1 
....:     n += 1 
....:                                                                                                                                 
sage: for _ in range(1000): add_sample()                                                                                                                                                                                  
sage: expected = lambda x : ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))                                                            
sage: observed = lambda x : counter[x]

sage: while not counter[0]: add_sample()                                                                                              
sage: while abs(expected(0)*1.0/observed(0) - 1.0) > 5e-2: add_sample()  # long time  

sage: while not counter[4]: add_sample()                                                                     
sage: while abs(expected(4)*1.0/observed(4) - 1.0) > 5e-2: add_sample()  # long time     

sage: while not counter[-10]: add_sample()                                                                                            
sage: while abs(expected(-10)*1.0/observed(-10) - 1.0) > 5e-2: add_sample()  # long time

Am I making any sense?

IMO this would actually be a valuable doctest and mathematically correct. Because the probability of this not terminating is indeed a zero probability (at least if things are implemented correctly).

One can maybe speed this up by increasing the initial sample and by adding a number of samples instead of just one at a time.

mwageringel commented 3 years ago

comment:17

Yes, these are very good suggestions. Thanks for the detailed proposal.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from f06cc7c to 90ed4c6

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

90ed4c6 replace non-safe tests by correct tests that terminate by law of large numbers

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d17384b fix fishy doctest in libs

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 90ed4c6 to d17384b

kliem commented 3 years ago

comment:21

I could only localize such a doctest in one other closed ticket and decided to fix it here as well.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from d17384b to b0c02b3

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b0c02b3 more meaningful replacement in libs

mwageringel commented 3 years ago

Reviewer: Samuel Lelièvre, Markus Wageringel

mwageringel commented 3 years ago

comment:23

Very nice. This works well.

I have tried the while loops manually to make sure they do not take too much time. Only this test in discrete_gaussian_lattice is sometimes a bit slow. This only happens rarely, but in one case it created more than 2 million samples.

sage: while abs(m*f(v)*1.0/c/counter[v] - 1.0) >= 0.2: add_samples(1000)  # long time

As this does not happen often, I think we can set this ticket to positive review, but you can change the bound of this test if you prefer.

vbraun commented 3 years ago

Changed branch from public/29972 to b0c02b3

sagemath / sage

Make stats doctests ready for random seeds #29972