Open videlec opened 2 years ago
Description changed:
---
+++
@@ -1 +1 @@
-We implement a sage compatible version of the [statistics module](https://docs.python.org/3/library/statistics.html) in `stats/statistics.py`. In particular, it will provide the `mean` and `median` functions that were recently deprecated for removal in #29662. See also #33432.
+We implement a sage compatible version of the [Python statistics module](https://docs.python.org/3/library/statistics.html) in `stats/statistics.py`. In particular, it will provide the `mean` and `median` functions that were recently deprecated for removal in #29662. See also #33432.
Branch: u/vdelecroix/33453
Branch pushed to git repo; I updated commit sha1. New commits:
0abcc58 | 33453: add statistics module to doc |
Thank you, this looks like a great addition which should take care of the necessary problems but continue providing what is needed. I didn't see any obvious problems in the code but it would be good to get a second more detailed eye.
Your new function mean
is not compatible with the built-in statistics.mean
- that specifies "If data is empty, StatisticsError
will be raised." https://docs.python.org/3/library/statistics.html#statistics.mean
(Also, the argument is called data
, not v
.)
Also, please include cross-references to the numpy
functions in the documentation so that this information is not lost.
Replying to @mkoeppe:
Your new function
mean
is not compatible with the built-instatistics.mean
- that specifies "If data is empty,StatisticsError
will be raised." https://docs.python.org/3/library/statistics.html#statistics.mean
True. I did that on purpose to follow the numpy behaviour. The current code that I wrote only emits a RuntimeWarning
.
(Also, the argument is called
data
, notv
.)
Is the argument name really relevant? Though data
is definitely much better.
Replying to @videlec:
Replying to @mkoeppe:
Your new function
mean
is not compatible with the built-instatistics.mean
- that specifies "If data is empty,StatisticsError
will be raised." https://docs.python.org/3/library/statistics.html#statistics.meanTrue. I did that on purpose to follow the numpy behaviour. The current code that I wrote only emits a
RuntimeWarning
.
That makes no sense - the point of the module is to be compatible not with numpy but with the built-in statistics module.
Replying to @videlec:
(Also, the argument is called
data
, notv
.)Is the argument name really relevant? Though
data
is definitely much better.
Yes, because the signature is mean(data)
, not mean(data, /)
, users are allowed to call it as mean(data=...)
.
Note thate Python argument naming is awful
statistics.variance(data, xbar=None)
statistics.pvariance(data, mu=None)
Compatibility > beauty
Replying to @mkoeppe:
Compatibility > beauty
This is not about beauty but coherence. However this is a very minor point. It is perfectly fine to keep as much as the Python world as we can.
More importantly
data
only contains builtin Python data (int
, float
, Fraction
, Decimal
, ...) should the code simply transfer to the Python statistics
module? Usually, sage functions tend to use py_scalar_to_element
which does convert int -> Integer
, float -> RealNumber
, etcThe distinction of xbar
and mu
is deliberate, see discussion in https://bugs.python.org/issue20389
Replying to @videlec:
More importantly
- if the user provides a numpy array then the appropriate numpy method is called. This is not the Python behaviour. Should I remove it?
Computing it via the numpy method I think is a good idea; but the result type / error handling must be compatible with the other types.
- If
data
only contains builtin Python data (int
,float
,Fraction
,Decimal
, ...) should the code simply transfer to the Pythonstatistics
module? Usually, sage functions tend to usepy_scalar_to_element
which does convertint -> Integer
,float -> RealNumber
, etc
I think it would make sense to always make the result a Sage type, via py_scalar_to_element
if necessary
Branch pushed to git repo; I updated commit sha1. New commits:
a60c096 | 33453: follow python specifications + improved doc |
Replying to @mkoeppe:
Also, please include cross-references to the
numpy
functions in the documentation so that this information is not lost.
Not sure about what you meant here.
Can't just replace sage.stats.basic_stats.mean
by lazy_import('sage.stats.statistics', 'mean', deprecation=33453)
. They have a different specification.
I have a very minor observation. I think there are a few typos in sage.stats.statistics
in the updated deprecation notices:
--- basic_stats.sage
+++ basic_stats.sage with small changes
@@ -76,7 +76,7 @@
sage: std([1..6], bias=True)
doctest:warning...
- DeprecationWarning: sage.stats.basic_stats.std is deprecated; use sage.stats.statstics.stdev or sage.stats.statistics.pstdev instead
+ DeprecationWarning: sage.stats.basic_stats.std is deprecated; use sage.stats.statistics.stdev or sage.stats.statistics.pstdev instead
See https://trac.sagemath.org/33453 for details.
1/2*sqrt(35/3)
sage: std([1..6], bias=False)
@@ -106,7 +106,7 @@
sage: std(data) # random
0.29487771726609185
"""
- deprecation(33453, 'sage.stats.basic_stats.std is deprecated; use sage.stats.statstics.stdev or sage.stats.statistics.pstdev instead')
+ deprecation(33453, 'sage.stats.basic_stats.std is deprecated; use sage.stats.statistics.stdev or sage.stats.statistics.pstdev instead')
if hasattr(v, 'standard_deviation'):
return v.standard_deviation(bias=bias)
Branch pushed to git repo; I updated commit sha1. New commits:
7cf846c | 33453: statstics -> statistics |
Replying to @videlec:
Replying to @mkoeppe:
Also, please include cross-references to the
numpy
functions in the documentation so that this information is not lost.Not sure about what you meant here.
We define the mean of the empty list to be the (symbolic) NaN,
following the convention of MATLAB, Scipy, and R.
- This function is deprecated. Use ``numpy.mean`` or ``numpy.nanmean``
- instead.
+ This function is deprecated. Use ``sage.stats.statistics.mean`` instead. The
+ differences with this function are
+
+ - the code does not try to call ``v.mean()``
+ - raises an error on empty input
Stuff like this -- please don't remove cross-references to numpy.
In
+ if is_numpy_type(type(data)):
+ import numpy
+ if isinstance(data, numpy.ndarray):
+ return data.mean()
I think the result should be coerced to a Sage number type (comment:18)
Replying to @mkoeppe:
In
+ if is_numpy_type(type(data)): + import numpy + if isinstance(data, numpy.ndarray): + return data.mean()
I think the result should be coerced to a Sage number type (comment:18)
I don't like the conversion afterwards so much. The mean of a list of integers is a floating point in numpy.
sage: data = [1,2,7,-11,15,23]
sage: statistics.mean(data)
37/6
sage: import sage.stats.statistics as statistics
sage: import numpy
sage: statistics.mean(numpy.array(data))
6.166666666666667
Maybe I should just remove these numpy shortcuts and simply document how to use the proper numpy methods in the documentation only?
Branch pushed to git repo; I updated commit sha1. New commits:
9e82df1 | 33453: clean doc |
There are pyflakes errors. Once those are fixed, I'm happy to give this a positive review.
Reviewer: David Roe
Replying to @roed314:
There are pyflakes errors. Once those are fixed, I'm happy to give this a positive review.
Please don't. comment:30 has to be sorted out first.
Also, I would like to mitigate the warnings in basic_stats
. Currently, the only deprecated behaviour of mean(v)
are when either
v.mean()
method of the objectv
is empty.
In all other cases, we can suppress the deprecation. And similarly for all other functions in basic_stats
.Changed reviewer from David Roe to Matthias Koeppe, David Roe
Description changed:
---
+++
@@ -1 +1,3 @@
We implement a sage compatible version of the [Python statistics module](https://docs.python.org/3/library/statistics.html) in `stats/statistics.py`. In particular, it will provide the `mean` and `median` functions that were recently deprecated for removal in #29662. See also #33432.
+
+See also https://docs.python.org/3.10/whatsnew/3.8.html#statistics
I just hit this deprecation warning again. It would be nice to remove it.
We implement a sage compatible version of the Python statistics module in
stats/statistics.py
. In particular, it will provide themean
andmedian
functions that were recently deprecated for removal in #29662. See also #33432.See also https://docs.python.org/3.10/whatsnew/3.8.html#statistics
CC: @mkoeppe
Component: statistics
Author: Vincent Delecroix
Branch/Commit: u/vdelecroix/33453 @
9e82df1
Reviewer: Matthias Koeppe, David Roe
Issue created by migration from https://trac.sagemath.org/ticket/33453