Closed kliem closed 3 years ago
At least the following need fixing:
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 7 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_lattice.py # 9 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_polynomial.py # 6 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/chmm.pyx # 8 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/distributions.pyx # 3 doctests failed
Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
Branch: public/29972
Changed dependencies from #29962 to none
Author: Jonathan Kliem
I have tried 6 different seeds and got one failure:
sage -t --long --warn-long 134.7 --random-seed=2002 src/sage/stats/distributions/discrete_gaussian_lattice.py
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_lattice.py", line 210, in sage.stats.distributions.discrete_gaussian_lattice.DiscreteGaussianDistributionLatticeSampler._normalisation_factor_zz
Failed example:
l.count(v) # abs tol 20
Expected:
64
Got:
87
Tolerance exceeded:
64 vs 87, tolerance 3e1 > 2e1
Branch pushed to git repo; I updated commit sha1. New commits:
a15025d | more tolerance |
Ok, I made it a bit more flexible.
These doctests are really difficult to fix. As you noticed, it is far from stable.
You replaced:
- sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (13355, 13298)
- sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (5479, 5467)
- sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (53, 51)
+ sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 13298
+ sage: l.count(x) # rel tol 5e-2
+ 13298
+ sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 5467
+ sage: l.count(x) # rel tol 5e-2
+ 5467
+ sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 51
+ sage: l.count(x) # rel tol 5e-1
+ 51
I would further rewrite that as:
+ sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ sage: observed = lambda x: l.count(x)
+ sage: expected(0)
+ 13298
+ sage: observed(0) # rel tol 5e-2
+ 13298
+ sage: expected(4)
+ 5467
+ sage: observed(4) # rel tol 5e-2
+ 5467
+ sage: expected(-10)
+ 51
+ sage: observed(-10) # rel tol 5e-1
+ 51
You replaced:
- sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
- (1.0, 1.00...)
- sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
- (1.32..., 1.36...)
+ sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1
+ (1.0, 1.0)
+ sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1
+ (1.36, 1.36)
I would further rewrite that as:
+ sage: expected = lambda x, y: (
+ ....: exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+ sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+ sage: expected(0, 1), observed(0, 1) # long time # abs tol 2e-1
+ (1.0, 1.0)
+ sage: expected(0, -100), observed(0, -100) # long time # abs tol 2e-1
+ (1.36, 1.36)
In src/sage/stats/hmm/chmm.pyx
you add set_random_seed(0)
in front of some tests.
Is there an explanation somewhere of when to do that? I did not find one at #29935.
Description changed:
---
+++
@@ -1,7 +1,9 @@
+Part of #29935.
+
This ticket makes
sage -t --long --random-seed=n src/sage/stats/
-pass for different values `n` than just `0`.
+pass for `n` more general than just `0`.
Replying to @slel:
You replaced:
- sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) - (13355, 13298) - sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) - (5479, 5467) - sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) - (53, 51) + sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) + 13298 + sage: l.count(x) # rel tol 5e-2 + 13298 + sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) + 5467 + sage: l.count(x) # rel tol 5e-2 + 5467 + sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) + 51 + sage: l.count(x) # rel tol 5e-1 + 51
I would further rewrite that as:
+ sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor)) + sage: observed = lambda x: l.count(x) + sage: expected(0) + 13298 + sage: observed(0) # rel tol 5e-2 + 13298 + sage: expected(4) + 5467 + sage: observed(4) # rel tol 5e-2 + 5467 + sage: expected(-10) + 51 + sage: observed(-10) # rel tol 5e-1 + 51
You replaced:
- sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time - (1.0, 1.00...) - sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time - (1.32..., 1.36...) + sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1 + (1.0, 1.0) + sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1 + (1.36, 1.36)
I would further rewrite that as:
+ sage: expected = lambda x, y: ( + ....: exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()) + sage: observed = lambda x, y: float(l.count(x))/l.count(y) + sage: expected(0, 1), observed(0, 1) # long time # abs tol 2e-1 + (1.0, 1.0) + sage: expected(0, -100), observed(0, -100) # long time # abs tol 2e-1 + (1.36, 1.36)
In
src/sage/stats/hmm/chmm.pyx
you addset_random_seed(0)
in front of some tests.Is there an explanation somewhere of when to do that? I did not find one at #29935.
Thanks for the above suggestions.
Yes, we agreed to avoid set_random_seed(0)
if at all possible.
It is harder to fix this for doctests that you don't understand.
I removed this now.
This still fails too frequently (8 times out of 50). I understand the difficulty in making this work and I do not have a good solution either. Personally, I have the impression that we should not write tests that are guaranteed to fail sporadically at all, especially considering the vast number of doctests in Sage. Every false positive adds noise to the development process, which can make real issues go unnoticed. Besides that, including a test in the documentation making false claims about a probability distribution is a bit confusing.
Setting the seed to 0 may be the only reliable solution, but I welcome other ideas and opinions. Or we just increase the tolerances enough for the tests to pass (almost) always.
When I was running a patchbot client, I fixed several flaky doctests just to get the patchbot to report usable results, which is a time consuming process. That is why I am wary about introducing new such issues.
sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/hmm/chmm.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13016 src/sage/stats/hmm/chmm.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13032 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13034 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13042 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13047 src/sage/stats/hmm/chmm.pyx # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13049 src/sage/stats/distributions/discrete_gaussian_integer.pyx # 1 doctest failed
I agree. See a similar discussion at
Yes, 8 out of 50 is definitely too much.
My idea was to have meaningful tests that fail about 1 in 10000 or less. That is how I did the tests in #29976. Run the corresponding function a couple thousand times and take the maximum and minimum.
E.g. there is a doctest in matrices that tests that random GF(2)
1000 x 1000 matrices have high rank (because that was an issue that has been fixed at some point). Now theoretically, it could still have a very low rank, but I don't think this will happen. The new doctest is rather meaningful, I think, because it documents that such a matrix will very probably (at least 99.9%) have rank 995 or higher. Also the doctest now is actually being run. It doesn't just tell us that for random_seed(0)
things work out, but that things usually work out.
Here is an example of something I feel comfortable with:
sage: def foo():
....: A = random_matrix(GF(2), 1000, 1000)
....: return float(A.density())
....:
sage: max(foo() for _ in range(100))
0.50165
sage: max(foo() for _ in range(1000))
0.501586
Then I decided that a tolerance of 0.01
to make it almost never fail.
Of course there should
sage: A = matrix(GF(2), 5, 5, 0)
- sage: A.randomize(0.5); A
- [0 0 0 1 1]
- [0 1 0 0 1]
- [1 0 0 0 0]
- [0 1 0 0 0]
- [0 0 0 1 0]
- sage: A.randomize(); A
- [0 0 1 1 0]
- [1 1 0 0 1]
- [1 1 1 1 0]
- [1 1 1 1 1]
- [0 0 1 1 0]
+ sage: len(A.nonzero_positions())
+ 0
+ sage: A.randomize(0.5)
+ sage: 0 <= len(A.nonzero_positions()) < 12
+ True
+ sage: A.randomize()
+ sage: 1 <= len(A.nonzero_positions()) < 24
+ True
Here the last one fails apparently something like 1 in a million. Lets say there are thousand tests like this in sage and it takes 2 hours for a patch bot to run all tests. Then apparently every 83 days your patchbot will report such an error. Is this ok? Of course it will only die, if this happens after a beta. This will happen something like every twenty years.
Maybe tests failing 1 in a 100 000 are still acceptable. But everything significantly below that would be annoying. You don't want to restart your patchbot every 50 days, because of stupid failures.
And of course, if the test becomes meaningless by that kind of tolerance, we might want to go with set_random_seed(0)
.
Maybe the far more easy and reasonable solution is to have the patchbots run at random seed 0, when they verify that they are not broken. I mean this is our reference seed. We can life with bots reporting once in a while incorrect failures (it happens anyway).
So we could at this to the patchbot scripts?
Sorry for all my comments.
Just forget what I said earlier. I still think that tests with set_random_seed(0)
are pretty worthless, because they don't test anything.
However, doctests should be mathematically correct. So rethinking:
sage: A = matrix(GF(2), 5, 5, 0)
sage: A.randomize()
sage: 1 <= len(A.nonzero_positions()) < 24
True
is NOT a good test. Not because it is likely to fail, but because it boils down to something as:
sage: randint(0, 2^24) != 0
True
It probably won't fail, but it is terrible.
There is two reasons for doctests that I think should be kept in mind
Here is my proposition:
I think the way to combine those things is using while statements in those tests for "randomness". Those tests succeed almost certainly in a mathematical correct way:
sage: while GF(3).random_element().is_zero(): pass
GF(2)
have different ranks:sage: def random_rank():
....: A = matrix(GF(2), 5, 5, 0)
....: A.randomize()
....: return A.rank()
sage: while random_rank() < 5: pass
while while random_rank() >= 4: pass
sage: from collections import defaultdict
sage: counter = defaultdict(Integer)
sage: def throw_dice(): counter[randint(1, 6)] += 1
sage: throw_dice()
sage: def test(): return all((counter[i]*1.0/sum(counter.values()) - 1/6) < 0.01 for i in counter)
sage: while not test(): inc()
sage: from sage.stats.distributions.discrete_gaussian_integer import DiscreteGaussianDistributionIntegerSampler
sage: sigma = 3.0
sage: D = DiscreteGaussianDistributionIntegerSampler(sigma=sigma)
sage: bound = (6*sigma).floor()
sage: norm_factor = sum([exp(-x^2/(2*sigma^2)) for x in range(-bound,bound+1)])
sage: from collections import defaultdict
sage: counter = defaultdict(Integer)
sage: n = 0
sage: def add_sample():
....: global counter, n
....: counter[D()] += 1
....: n += 1
....:
sage: for _ in range(1000): add_sample()
sage: expected = lambda x : ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
sage: observed = lambda x : counter[x]
sage: while not counter[0]: add_sample()
sage: while abs(expected(0)*1.0/observed(0) - 1.0) > 5e-2: add_sample() # long time
sage: while not counter[4]: add_sample()
sage: while abs(expected(4)*1.0/observed(4) - 1.0) > 5e-2: add_sample() # long time
sage: while not counter[-10]: add_sample()
sage: while abs(expected(-10)*1.0/observed(-10) - 1.0) > 5e-2: add_sample() # long time
Am I making any sense?
IMO this would actually be a valuable doctest and mathematically correct. Because the probability of this not terminating is indeed a zero probability (at least if things are implemented correctly).
One can maybe speed this up by increasing the initial sample and by adding a number of samples instead of just one at a time.
Yes, these are very good suggestions. Thanks for the detailed proposal.
Branch pushed to git repo; I updated commit sha1. New commits:
90ed4c6 | replace non-safe tests by correct tests that terminate by law of large numbers |
Branch pushed to git repo; I updated commit sha1. New commits:
d17384b | fix fishy doctest in libs |
I could only localize such a doctest in one other closed ticket and decided to fix it here as well.
Branch pushed to git repo; I updated commit sha1. New commits:
b0c02b3 | more meaningful replacement in libs |
Reviewer: Samuel Lelièvre, Markus Wageringel
Very nice. This works well.
I have tried the while loops manually to make sure they do not take too much time. Only this test in discrete_gaussian_lattice
is sometimes a bit slow. This only happens rarely, but in one case it created more than 2 million samples.
sage: while abs(m*f(v)*1.0/c/counter[v] - 1.0) >= 0.2: add_samples(1000) # long time
As this does not happen often, I think we can set this ticket to positive review, but you can change the bound of this test if you prefer.
Changed branch from public/29972 to b0c02b3
Part of #29935.
This ticket makes
pass for
n
more general than just0
.CC: @slel
Component: doctest framework
Author: Jonathan Kliem
Branch/Commit:
b0c02b3
Reviewer: Samuel Lelièvre, Markus Wageringel
Issue created by migration from https://trac.sagemath.org/ticket/29972