sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.47k stars 485 forks source link

Clarify and enhance descriptive statistics (and more) #29663

Open kcrisman opened 4 years ago

kcrisman commented 4 years ago

We have some basic statistics functionality in sage stats for some descriptive statistics. Unfortunately, it is really basic.

This ticket is for clarifying the relationship of that material to the Sage probability distributions, histogram, Scipy, GSL, and other libraries - perhaps including pandas, though this is not (yet) standard in Sage.

If all of those generate interest, this ticket would be converted to a metaticket to keep track of them.

Depends on #29662

CC: @NathanDunfield

Component: statistics

Issue created by migration from https://trac.sagemath.org/ticket/29663

dimpase commented 4 years ago

Dependencies: #29662

kcrisman commented 4 years ago

Description changed:

--- 
+++ 
@@ -5,5 +5,6 @@
 * Ideally there would be interfaces to the best native Python functionality rather than something specific to Sage (though that may not be possible).
 * There may be a tutorial page in the (reference manual) documentation for demonstrating best practices.
 * There could be a more education-oriented tutorial elsewhere, along the lines of [the PREP Quickstart](http://doc.sagemath.org/html/en/prep/Quickstarts/Statistics-and-Distributions.html) but more comprehensive.
+* As noted at #29662, Python 3 has a [stats module](https://docs.python.org/3/library/statistics.html), though presumably that module can't handle (say) the mean of several `Integer`s or even stranger objects, as-is.

 If all of those generate interest, this ticket would be converted to a metaticket to keep track of them.
NathanDunfield commented 4 years ago
comment:3

I use pandas pretty heavily from within Sage (Python 2.7 version). The only problem I encounter has to do with pandas not recognizing Sage's Integer as an integer. Assuming one has the standard preparser on, you have to do things like:

dataframe.loc[int(100)]
dataframe.apply(some_function, axis=int(1))

to keep it happy.

kcrisman commented 4 years ago
comment:4

I use pandas pretty heavily from within Sage (Python 2.7 version).

Hmm, yeah that is exactly the kind of problem I expected (brian had some similar issues iirc). I assume you pip install it, not included in our Python from the get-go, right?

NathanDunfield commented 4 years ago
comment:5

Replying to @kcrisman:

I assume you pip install it, not included in our Python from the get-go, right?

Yes, I just use pip install which has always worked smoothly (though it takes a bit of time to compile). The main dependency is just a reasonably recent version of numpy which of course Sage has.

sheerluck commented 2 years ago
comment:6

Replying to @NathanDunfield:

pandas not recognizing Sage's Integer as an integer.

I added

from sage.rings.integer import Integer
if type(key) is Integer:
    ...

to pandas/core/indexes/{base,range}.py