Closed eberhard-leander closed 9 months ago
It seems that an overflow occurs in the last case, as I obtain the following warning:
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda311/lib/python3.11/site-packages/dcor/_dcor_internals.py:188: RuntimeWarning: overflow encountered in scalar multiply
third_term = a_total_sum * b_total_sum / n_samples
I will check if it is possible to avoid it when I have a moment.
I added a possible fix in #60. Note that converting to floating point arrays is still preferable, because the AVL implementation is compiled in that case.
I will merge #60. Note that this can STILL overflow, specially in Windows where the default integer type (used for integer reductions, if the original type was smaller) has only 32 bits. As mentioned before, converting to floating point is preferred.
While experimenting with this package, I encountered a strange issue and thought it would be useful to post about it here. In short, it appears that the distance_correlation computation for
int
dtypes is incorrect when the size of the data is sufficiently large.Here is a minimal example that can be used to replicate the issue:
Now when we run this code for small samples, the correlations for all dtypes agree, and do not substantially change with the sample size.
However, past a certain point, the computations diverge:
I've started casting everything to
float
before computing the correlations to avoid this issue.