simonw / datasette-statistics

SQL statistics functions for Datasette
8 stars 1 forks source link

Tests failing on 3.7 and 3.6 #3

Closed simonw closed 3 years ago

simonw commented 3 years ago

From https://github.com/simonw/datasette-statistics/actions/runs/1347047939 in #2

https://docs.python.org/3/library/statistics.html#statistics.geometric_mean was added in Python 3.8

simonw commented 3 years ago

cc @ambv - options here are:

simonw commented 3 years ago

Backporting is definitely feasible, since the full implementation is: https://github.com/python/cpython/blob/00ffc4513df7b89a168e88da4d1e3ac367f7682f/Lib/statistics.py#L372-L388

def geometric_mean(data):
    """Convert data to floats and compute the geometric mean.
    Raises a StatisticsError if the input dataset is empty,
    if it contains a zero, or if it contains a negative value.
    No special efforts are made to achieve exact results.
    (However, this may change in the future.)
    >>> round(geometric_mean([54, 24, 36]), 9)
    36.0
    """
    try:
        return exp(fmean(map(log, data)))
    except ValueError:
        raise StatisticsError('geometric mean requires a non-empty dataset '
                              ' containing positive numbers') from None
simonw commented 3 years ago

For the moment I'm going to remove geometric_mean to get the tests passing again.

simonw commented 3 years ago

Two surprising test failures on 3.6:

function = 'statistics_stdev'
E           assert 2.0062794503020767 == 2.003581 ± 2.0e-06
E            +  where 2.0062794503020767 = <bound method Results.single_value of <datasette.database.Results object at 0x1044cc240>>()
E            +    where <bound method Results.single_value of <datasette.database.Results object at 0x1044cc240>> = <datasette.database.Results object at 0x1044cc240>.single_value

And

function = 'statistics_variance'
E           assert 4.006259780907668 == 4.0 ± 4.0e-06
E            +  where 4.006259780907668 = <bound method Results.single_value of <datasette.database.Results object at 0x104521e48>>()
E            +    where <bound method Results.single_value of <datasette.database.Results object at 0x104521e48>> = <datasette.database.Results object at 0x104521e48>.single_value

I bodged a fix for these like so but I'd like to understand what went wrong here: https://github.com/simonw/datasette-statistics/blob/5889af033ec35a11fc9a989426a7938120256ea8/tests/test_statistics.py#L30

ambv commented 3 years ago

Thanks for dealing with this so quickly, Simon!