Open mycarta opened 3 years ago
No, as today. The code would need to have a separate branch to handle that case, but it should be relatively easy to implement (adding a new function in _hypothesis
to perform a permutation test using the original array instead of the distance matrix, and using that when the method is not NAIVE
). If you want to try a PR I could review it.
BTW, if you have additional CPUs you can use the 'AVL' method in distance_correlation
and the rowwise
function for an extra boost.
I am at capacity until the fall. After the summer, if as I hope I will have more time, I can give it a try.
For the purposes of my current projects, for the time being I am going to decimate my array really heavily:
decimated_df = data.copy().sample(frac=0.05, random_state=1)
WIth reference to the exampel in this notebook, this weekend I compared the performance of the the
MERGESORT
method vs. theNAIVE
with a toy dataset of 8 columns x 21 rows:vs:
Since i sometimes work with many thousands of rows, and possibly more columns, I wonder if there is a way to similarly improve the speed of the pairwise p-value calculation: