Closed mdhaber closed 1 year ago
@mdhaber you can checkoff the box for the 'check for NaN in spearmanrho' now
Yup, thanks!
@mdhaber you can checkoff box for the multivariate t distribution
now too.
https://github.com/scipy/scipy/pull/11119 was merged, so you can check off the new cramervonmises
test.
PR for the relative risk: https://github.com/scipy/scipy/pull/13048
@WarrenWeckesser Two more weeks left in the quarter...
@mdhaber you can check off the multivariate hypergeometric box:
multivariate hypergeometric distribution - scipy#12585, scipy#12839 (@mdhaber)
I apologize if this is not the appropriate channel to open this discussion.
Following the issues covered in #11477, I would like to share my findings related to dcdflib
, which is used to evaluate the cumulative density function (CDF).
dcdflib
was added to the SciPy's subversion repository on February 23, 2002 4423ed55, possibly from the SciPy's CVS repository, which I didn't found online.What can be done?
dcdflib
with the boost math toolkit. In the last two cases, a follow-up of the code's modification has to be done.
Why?
I would be happy to help in any direction you decide.
Hi @caos21, thanks for mentioning this. @mckib2 is actually working on replacing parts of SciPy.stats with the Boost versions in #48. Would you be interested in taking a look at that? We're not going to change everything at once; this first PR will only actually replace SciPy's beta
, binom
, and nbinom
distributions. The idea is to get all the machinery in place so that it will be easy to take things from Boost as needed in future PRs. Would this make it easy to replace cdflib
with Boost's tools?
Hi and thank you @mdhaber , I think it is reasonable. But first, I would like to inspect how involved is cdflib
in all SciPy.
In the meantime, I can update cdflib
to V1.1 and apply all the patches and modifications done in the past. In that way, I hope nothing breaks.
Should we move this discussion to #48 ?
In that case, it would probably be better to open an issue or PR on the main repo, or maybe email the mailing list to get wider attention. Only a few of us are working here now.
Perfect I will do, and after that, I will jump into boost #48 to see how can I be of use
@mdhaber Don't know if we're interested in still keeping this list up to date:
Thanks @mckib2. We've been working from Monday.com recently, but it is still good to check these off.
Functions/distributions we might want to borrow from Boost:
ncf
- gh-2877ncx2
- gh-11777logser
- gh-3890nct
- gh-7104@tupui At a glance, these are the issues and PRs we have open for multivariate distributions. Multivariate distributions represent ~1/5 of the number of open issues and PRs and issues with the scipy.stats
label.
Multivariate Distributions - 30 of the 187 issues with scipy.stats
label, 9 out of 52 PRs with scipy.stats
label as of 3/14/2022.
Multivariate Normal - (Fewer than 150 lines of real code has all these issues and PRs.)
PRs
Issues
New Distribution
PRs
Issues
Other
PRs
Issues
IIRC, several bugs involving constant input (i.e. all elements of a slice equal) have been reported. I'll collect them here as I run across them.
gh-13254 tried to address this for some functions, but I suspect it is a widespread problem.
Overview of "A Solid Foundation for Statistics in Python with SciPy".
Expand tools for the analysis of variance
New Statistical Tests
Improve Existing Tests
[x] Confidence intervals - scipy/scipy#13371 (@mdhaber)
Binomial Test - scipy/scipy#12603 (@WarrenWeckesser)pearsonr - scipy/scipy#12609[x] Options for one-sided p-values - scipy/scipy#12506, (@DominicChm)
ttests - scipy/scipy#12597skewtest / kurtosistest / ranksums - scipy/scipy#13549spearmanr/linregress - scipy/scipy#12801mood - scipy/scipy#13008ansari - scipy/scipy#13650pearsonr - scipy/scipy#12609[x] Enhanced results for 2 x 2 contingency tables
Conditional maximum likeilhood odds ratio - scipy/scipy#13340 (@WarrenWeckesser)Relative risk - scipy/scipy#13048 (@WarrenWeckesser)Fitting Probability Distributions to Data
fit
methods where possible - scipy/scipy#11782 (@swallan)laplace - scipy/scipy#11988pareto - scipy/scipy#12457rayleigh - scipy/scipy#12097invgauss - scipy/scipy#12514logistic - scipy/scipy#12738gumbel - scipy/scipy#12737New Probability Distributions
Improve underlying code for PDF and CDF calculations
Decrease Open Statistics Issues By the end of the project, we want the number of open stats issues to be below 282 (number of open stats issues on 3/18/2020), and preferably under 261 (number of open stats issues on 3/18/2020 created before project start date 2/1/2020). This is @mdhaber's list of issues to watch/fix; none need to be closed to finish the project, but it would be great to make a dent.
differential_entropy
- scipy/scipy#13631mannwhitneyu
- scipy/scipy#12837, scipy/scipy#11113moment
method for input arrays - scipy/scipy#12197stats.binom.cdf
issue - scipy/scipy#13079 (watching)anderson_ksamp
- scipy/scipy#11140scipy.stats.skew
roundoff error - scipy/scipy#11086 - @WarrenWeckesser will re-review/mergestats.zscore
roundoff error - scipy/scipy#12815gausshyper
distribution accepts invalid parameters - scipy/scipy#10134rayleigh.fit
issue - scipy/scipy#13071binned_statistic_dd
- scipy/scipy#12898stats.lognorm
- scipy/scipy#12844pearsonr
- scipy/scipy#9307scipy.special.bdtrik
incorrect - scipy/scipy#11134scipy.special.btdtri
inaccurate - scipy/scipy#12794, scipy/scipy#12635weightedtau
documentation mistake - scipy/scipy#12778rv_histogram
PR - scipy/scipy#12759 - @WarrenWeckesser will closerv_continuous
assumes scalar parameters - scipy/scipy#10661circstd
- scipy/scipy#10096interval
method description - scipy/scipy#9706fit
ting several distributions - scipy/scipy#1884expect
termination condition - scipy/scipy#2983test_continuous_basic
- scipy/scipy#2071Outreach Event
Other