Spurious test failures - Githubissues

jacg commented 5 years ago

Every now and then our Travis builds (or our local tests) fail spuriously. This sometimes wastes a lot of time and causes confusion. It's difficult to understand why these spurious failures occur, because it happens rarely, across different tests, and usually while we are in a hurry to do something else.

In order to gather information about these failures, so that we might be able to eliminate them, please add a comment to this issue every time you encounter such a spurious test failure.

Please mention the name of the test that failed, and information about exactly how it failed. If it happened in a Travis build, then provide a link to the build. If it happend on your local machine, then copy paste a relevant portion of the traceback.

Also, every time you add a failure, please edit this list, so that we keep a summary of the failures we have seen.

test_mean_for_pmts_fee_is_unbiased: 1
test_inverse_cdf_hypothesis_generated: 2 [fixed in #575]
test_pmap_event_id_selection: 1
test_scaled_dark_pedestal_pedestal: 1
test_fill_kdst_var_1d: 1

jacg commented 5 years ago

test_mean_for_pmts_fee_is_unbiased

https://travis-ci.org/nextic/IC/jobs/443937686

jacg commented 5 years ago

Hmm, restarting the job makes the original build which contains the failure, disappear. Does anybody know how whether Travis keeps the original build somewhere?

If not, we'll need a copy-paste rather than a link.

jacg commented 5 years ago

=================================== FAILURES =================================== ____ test_inverse_cdf_hypothesisgenerated ____ [gw3] linux -- Python 3.7.0 /home/travis/miniconda/envs/IC-3.7-2018-08-29/bin/python

@given(valid_distributions(),

     floats(min_value = 0.1, max_value = 0.9))
def test_inverse_cdf_hypothesis_generated(distribution, percentile):

invisible_cities/core/random_sampling_test.py:140:

../../../miniconda/envs/IC-3.7-2018-08-29/lib/python3.7/site-packages/hypothesis/core.py:587: in execute result = self.test_runner(data, run) ../../../miniconda/envs/IC-3.7-2018-08-29/lib/python3.7/site-packages/hypothesis/executors.py:58: in default_new_style_executor return function(data) ../../../miniconda/envs/IC-3.7-2018-08-29/lib/python3.7/site-packages/hypothesis/core.py:578: in run return test(*args, *kwargs) invisible_cities/core/random_sampling_test.py:140: in test_inverse_cdf_hypothesis_generated floats(min_value = 0.1, max_value = 0.9)) ../../../miniconda/envs/IC-3.7-2018-08-29/lib/python3.7/site-packages/hypothesis/core.py:525: in test result = self.test(args, **kwargs)

distribution = ([-2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, ...], array([1.4552137e-07... 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07], dtype=float32)) percentile = 0.8999999761581423

@given(valid_distributions(),
       floats(min_value = 0.1, max_value = 0.9))
def test_inverse_cdf_hypothesis_generated(distribution, percentile):
    domain, freq = distribution
    cdf = cdf_from_freq(freq)
    for i, (d, cp) in enumerate(zip(domain, cdf)):
        if cp >= percentile:
            true_value = d
            break
    icdf = inverse_cdf(domain, cdf, percentile)

  assert true_value == icdf
E assert 0.0 == -2.220446e-14

invisible_cities/core/random_sampling_test.py:149: AssertionError ---------------------------------- Hypothesis ---------------------------------- Falsifying example: test_inverse_cdf_hypothesis_generated(distribution=([-2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, -2.220446e-14, 0.0], array([1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07, 1.4552137e-07], dtype=float32)), percentile=0.8999999761581423)

jjgomezcadenas commented 5 years ago

@gonzaponte why the assert?? It should be a numpy_equal or similar, with a tolerance, notice the failure, 0 not euqual to 10***-14!!! This should be easy to fix

gonzaponte commented 5 years ago

why the assert??

I guess you mean why the ==

It should be a numpy_equal or similar, with a tolerance, notice the failure, 0 not euqual to 10***-14!!!

This is an unfortunate copy&paste from a similar test, I will fix it.

This should be easy to fix

It is!

mmkekic commented 5 years ago

=================================== FAILURES ===================================

____ test_inverse_cdf_hypothesisgenerated ____ [gw3] linux -- Python 3.7.0 /home/travis/miniconda/envs/IC-3.7-2018-10-20/bin/python @given(valid_distributions(),

     floats(min_value = 0.1, max_value = 0.9))
def test_inverse_cdf_hypothesis_generated(distribution, percentile): invisible_cities/core/random_sampling_test.py:141:

../../../miniconda/envs/IC-3.7-2018-10-20/lib/python3.7/site-packages/hypothesis/core.py:610: in execute result = self.test_runner(data, run) ../../../miniconda/envs/IC-3.7-2018-10-20/lib/python3.7/site-packages/hypothesis/executors.py:58: in default_new_style_executor return function(data) ../../../miniconda/envs/IC-3.7-2018-10-20/lib/python3.7/site-packages/hypothesis/core.py:601: in run return test(*args, *kwargs) invisible_cities/core/random_sampling_test.py:141: in test_inverse_cdf_hypothesis_generated floats(min_value = 0.1, max_value = 0.9)) ../../../miniconda/envs/IC-3.7-2018-10-20/lib/python3.7/site-packages/hypothesis/core.py:548: in test result = self.test(args, **kwargs)

distribution = ([-1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, ...], array([1.46089...-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07], dtype=float32)) percentile = 0.8999999761581423 @given(valid_distributions(), floats(min_value = 0.1, max_value = 0.9)) def test_inverse_cdf_hypothesis_generated(distribution, percentile): domain, freq = distribution cdf = cdf_from_freq(freq) for i, (d, cp) in enumerate(zip(domain, cdf)): if cp >= percentile: true_value = d break icdf = inverse_cdf(domain, cdf, percentile) assert icdf == approx(true_value) E assert -1.0214052e-12 == 0.0 ± 1.0e-12 E + where 0.0 ± 1.0e-12 = approx(0.0) invisible_cities/core/random_sampling_test.py:150: AssertionError ---------------------------------- Hypothesis ---------------------------------- Falsifying example: test_inverse_cdf_hypothesis_generated(distribution=([-1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, -1.0214052e-12, 0.0], array([1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07, 1.460898e-07], dtype=float32)), percentile=0.8999999761581423)

mmkekic commented 5 years ago

_ test_pmap_event_idselection [gw1] linux -- Python 3.7.0 /home/travis/miniconda/envs/IC-3.7-2018-10-20/bin/python @given(dictionaries(keys=integers(min_value=-1e5, max_value=1e5), values=pmaps(), max_size=5), lists(integers(min_value=-1e5, max_value=1e5)))

def test_pmap_event_id_selection(pmaps, events): E hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 4 valid examples in 1.17 seconds (2 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters). E See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test. invisible_cities/reco/pmaps_functions_test.py:83: FailedHealthCheck ---------------------------------- Hypothesis ---------------------------------- You can add @seed(320050419280151807443131340231084161863) to this test or run pytest with --hypothesis-seed=320050419280151807443131340231084161863 to reproduce this failure.

gonzaponte commented 5 years ago

[gw2] linux -- Python 3.7.0 /home/travis/miniconda/envs/IC-3.7-2018-11-07/bin/python

dark_spectrum_local = ((array([-5.        , -4.8989899 , -4.7979798 , -4.6969697 , -4.5959596 ,
       -4.49494949, -4.39393939, -4.29292929....56286665e+01, 3.25217578e+01, 2.30167373e+01, 1.62137485e+01,
       1.14155084e+01, 8.08971455e+00, 5.83604597e+00]))

[1m    def test_scaled_dark_pedestal_pedestal(dark_spectrum_local):[0m
[1m        (bins, nsamples, scale, poisson_mean,[0m
[1m         pedestal_mean, pedestal_sigma,[0m
[1m         gain, gain_sigma, min_integral), expected_spectrum = dark_spectrum_local[0m
[1m    [0m
[1m        xs       = shift_to_bin_centers(bins)[0m
[1m        pedestal = spe.binned_gaussian_spectrum(pedestal_mean, pedestal_sigma, nsamples, bins)[0m
[1m        f        = spe.scaled_dark_pedestal(pedestal,[0m
[1m                                            pedestal_mean, pedestal_sigma,[0m
[1m                                            min_integral)[0m
[1m        actual_spectrum = f(xs, scale, poisson_mean,[0m
[1m                            gain, gain_sigma)[0m
[1m    [0m
[1m        x0, s0    = pedestal_mean, pedestal_sigma[0m
[1m        selection = in_range(shift_to_bin_centers(bins),   x0 - 5 * s0,    x0 + 5 * s0)[0m
[1m        pull      = expected_spectrum[selection]   -  actual_spectrum[selection][0m
[1m        pull     /= expected_spectrum[selection]**0.5[0m
[1m>       assert np.all(in_range(pull, -2.5, 2.5))[0m
[1m[31mE       assert False[0m
[1m[31mE        +  where False = <function all at 0x7f8f21448620>(array([ True,  True,  True,  True,  True, False,  True,  True,  True,\n        True,  True,  True,  True,  True,  True,... True,  True,  True,  True,  True,  True,  True,  True,\n        True,  True,  True,  True,  True,  True,  True,  True]))[0m
[1m[31mE        +    where <function all at 0x7f8f21448620> = np.all[0m
[1m[31mE        +    and   array([ True,  True,  True,  True,  True, False,  True,  True,  True,\n        True,  True,  True,  True,  True,  True,... True,  True,  True,  True,  True,  True,  True,  True,\n        True,  True,  True,  True,  True,  True,  True,  True]) = in_range(array([ 0.24724279, -0.63518977, -0.34419627,  0.51071258, -0.29183231,\n       -2.54954483, -0.18808316, -0.19375065, ...245982,  0.26678097, -0.7749531 ,  0.41749303, -0.18162616,\n       -0.73069211, -0.18234846, -1.00615231, -0.18408912]), -2.5, 2.5)[0m

gonzaponte commented 5 years ago

____________________________ test_fill_kdst_var_1d _____________________________
[gw2] linux -- Python 3.7.0 /home/travis/miniconda/envs/IC-3.7-2018-11-07/bin/python

[1m    @given(data_frames(columns=columns(kdst_variables, elements=floats(allow_nan=False))))[0m
[1m>   @settings(deadline=None)[0m
[1m    def test_fill_kdst_var_1d(kdst):[0m
[1m[31mE   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 7 valid examples in 1.05 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).[0m
[1m[31mE   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.[0m

[1m[31minvisible_cities/reco/monitor_functions_test.py[0m:420: FailedHealthCheck

next-exp / IC

Spurious test failures #551