neurodata / hyppo

Python package for multivariate hypothesis testing
https://hyppo.neurodata.io/
Other
219 stars 87 forks source link

Merge Dcorr/Energy/MMD/Hsic into scipy #98

Open sampan501 opened 4 years ago

bstraus1 commented 3 years ago

Can I be assigned to this issue?

bstraus1 commented 3 years ago

I've added the distance_correlation function (and its helpers) to my fork of SciPy in the ben-master branch.

The distance_correlation function is modeled off of the multiscale_graphcorr function which is already merged into SciPy and the helpers are from the hyppo repo (primarily the dcorr function).

I've also added a simple test (more to see that the code runs and returns an expected result rather than for accuracy). The function passes the test. This link goes directly to the distance_correlation function which relies on multiple other functions in the _stats.pyx file.

See attached screenshot of the test passed. Tests are in the same branch here. @sampan501

Screen Shot 2021-05-04 at 5 33 52 PM

Suggested future steps include:

sampan501 commented 3 years ago

None of the functions that you added have been Cythonized. All the functions added use def which is calling pure Python

bstraus1 commented 3 years ago

Then I don’t really understand why cythonizing is necessary if pure python works. @sampan501

sampan501 commented 3 years ago

Then I don’t really understand why cythonizing is necessary if pure python works.

Pure python is slow. With large datasets, it will long time to run.

bstraus1 commented 3 years ago

Oh I see. I wasn't aware of that being an essential part for any reason beyond making the code run-able in SciPy.

bstraus1 commented 3 years ago

Updates:

  1. _dcorr has been cythonized
  2. _center_distance_matrix has been changed to work for biased and unbiased data and will now work for mgc (multi-scale graph correlation) and dcorr (distance correlation).
  3. Bug Fix: The distance_correlationfunction now uses _dcorr in the pvalue computation function _perm_test to compute the test statistic rather than _mgc_stat
  4. Documentation: Added more comments throughout the distance_correlation function call tree

@sampan501

bstraus1 commented 3 years ago

Documentation for the distance_correlation function still needs to be changed (it is currently just copied from multiscale_graphcorr.

kfangster commented 3 years ago

Can I be assigned this issue?

sampan501 commented 2 years ago

https://github.com/kfangster/scipy