sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Restore basic stats commands to the global name space #33432

Open kwankyu opened 2 years ago

kwankyu commented 2 years ago

Mean, median and mode are now deprecated by #29662. E.g.:

>median([1,2,3])
2
:1: DeprecationWarning: sage.stats.basic_stats.median is deprecated; use numpy.median or numpy.nanmedian instead
See https://trac.sagemath.org/29662 for details.

But these basic functions should have some default functionality. It seems strange to not have a top-level "mean" or "median" function, given all of the other esoteric top-level functions.

The idea is to provide mean and median commands with functionality at least of the deprecated commands.

Discussion in sage-support: https://groups.google.com/g/sage-support/c/fglHtSGKFJk

Component: statistics

Issue created by migration from https://trac.sagemath.org/ticket/33432

mkoeppe commented 2 years ago
comment:1

The functionality is still there; it has not been removed.

kwankyu commented 2 years ago
comment:2

Replying to @mkoeppe:

The functionality is still there; it has not been removed.

Then do we have the option of abolishing the deprecation, instead of relying on numpy?

mkoeppe commented 2 years ago
comment:3

Why would we?

mkoeppe commented 2 years ago
comment:4

The deprecation messages are an improvement over the previous status quo. They point users to more suitable facilities.

jhpalmieri commented 2 years ago
comment:5

The deprecation messages may also provide developers with incentive to produce improved versions of those functions. I don't know if this is a problem:

sage: import numpy
sage: type(numpy.mean([1,2,3]))
<class 'numpy.float64'>
mkoeppe commented 2 years ago
comment:6

Yes, numpy's function returns a numpy type. That's why it should be called explicitly as np.mean after import numpy as np; importing these functions into our global namespace would not be a good idea.

kwankyu commented 2 years ago
comment:7

Okay. Then how do we provide mean and median in the global namespace, which is the goal of this ticket?

kwankyu commented 2 years ago

Description changed:

--- 
+++ 
@@ -11,7 +11,5 @@

 The idea is to provide `mean` and `median` commands with functionality at least of the deprecated commands.

-For example in sage.all we could do: "from numpy import mean, median". 
-
 Discussion in sage-support: https://groups.google.com/g/sage-support/c/fglHtSGKFJk
mkoeppe commented 2 years ago
comment:9

They are still in the global namespace.

kwankyu commented 2 years ago
comment:10

Replying to @mkoeppe:

They are still in the global namespace.

I am confused. They are deprecated, and will be removed from the global namespace.

mkoeppe commented 2 years ago
comment:11

Only if we remove them. We don't have to.

kwankyu commented 2 years ago
comment:12

Replying to @mkoeppe:

Only if we remove them. We don't have to.

I see your idea.

But I don't agree with you. Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached. This is the point of this ticket.

mkoeppe commented 2 years ago
comment:13

Replying to @kwankyu:

Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached.

The deprecation message provides a necessary commentary/update to their teaching materials.

videlec commented 2 years ago
comment:14

Replying to @mkoeppe:

Replying to @kwankyu:

Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached.

The deprecation message provides a necessary commentary/update to their teaching materials.

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

egourgoulhon commented 2 years ago
comment:15

Replying to @videlec:

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

+1

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

+1

kcrisman commented 2 years ago
comment:16

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

+1

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

+1

Yes.

See also this comment; it's unfortunate that it sounds like it might be too hard to overload those e.g. mean to work with Sage integers if/when Python 3.8+ becomes default.

mkoeppe commented 2 years ago
comment:17

Replying to @videlec:

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are.

That by itself does not sound like a good plan. There's still the disservice to learners: A Sage-specific dead end with limited functionality and no perspective.

How about this:

But someone would have to work on it.

mkoeppe commented 2 years ago
comment:18

(There is no new difficulty relating to Python >= 3.8.)

videlec commented 2 years ago
comment:19

This is hitting #28234

sage: import statistics
sage: statistics.mean([1,2,3,4])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8b5038d5efc8> in <module>
----> 1 statistics.mean([Integer(1),Integer(2),Integer(3),Integer(4)])

/usr/lib/python3.10/statistics.py in mean(data)
    327     if n < 1:
    328         raise StatisticsError('mean requires at least one data point')
--> 329     T, total, count = _sum(data)
    330     assert count == n
    331     return _convert(total / n, T)

/usr/lib/python3.10/statistics.py in _sum(data)
    196     else:
    197         # Sum all the partial sums using builtin sum.
--> 198         total = sum(Fraction(n, d) for d, n in partials.items())
    199     return (T, total, count)
    200 

/usr/lib/python3.10/statistics.py in <genexpr>(.0)
    196     else:
    197         # Sum all the partial sums using builtin sum.
--> 198         total = sum(Fraction(n, d) for d, n in partials.items())
    199     return (T, total, count)
    200 

/usr/lib/python3.10/fractions.py in __new__(cls, numerator, denominator, _normalize)
    146             isinstance(denominator, numbers.Rational)):
    147             numerator, denominator = (
--> 148                 numerator.numerator * denominator.denominator,
    149                 denominator.numerator * numerator.denominator
    150                 )

TypeError: unsupported operand type(s) for *: 'builtin_function_or_method' and 'builtin_function_or_method'

The dilemma remains

mkoeppe commented 2 years ago
comment:20

Yes, I know, this is something to address for the "long term plan".

But it does not block work for the "short term plan".

mkoeppe commented 2 years ago
comment:21

Replying to @videlec:

  • convince Python dev that numerator()/denominator() should be equally supported on the python side (which has already been tried by Jeroen in the past)

The problem in statistics is more specifically #28234 comment:62

videlec commented 2 years ago
comment:22

I will start working on the (little bit) more ambitious #33453.

videlec commented 2 years ago
comment:23

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

kwankyu commented 2 years ago
comment:24

Replying to @videlec:

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

So the plan is to also import the other functions from the sage.stats.statisics module into the global namespace after the deprecation period?

videlec commented 2 years ago
comment:25

Replying to @kwankyu:

Replying to @videlec:

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

So the plan is to also import the other functions from the sage.stats.statisics module into the global namespace after the deprecation period?

Nothing is fixed yet. We could already pull all the contents of sage.stats.statistics into the global namespace but mode (whose specification conflicts with stats.basic_stats.mode). The deprecations have to stay because of the change of behaviour

To my mind, I think it is better to have them as statistics.mean, statistics.median, etc rather than in the global namespace. But that is a personal taste.

kwankyu commented 2 years ago
comment:26

Replying to @videlec:

To my mind, I think it is better to have them as statistics.mean, statistics.median, etc rather than in the global namespace. But that is a personal taste.

The original idea of this ticket is to have the basic stats command readily available from the global namespace.