Hi, thank you for your phenomenal work writing and documenting this library.
As I'm sure you're aware, there has been some literature suggesting that an energy statistic that is more robust to outliers can be calculated by taking the median rather than mean when calculating the average distance between samples. See: James, N. A., Kejariwal, A., & Matteson, D. S. (2016). Leveraging cloud data to mitigate user experience from ‘breaking bad.’ 2016 IEEE International Conference on Big Data (Big Data), 3499–3508. https://doi.org/10.1109/BigData.2016.7841013. Specifically section 3a of that article, "Robustness against Anomalies".
From looking at this library, it seems to me that this change would be as simple as allowing a configurable "average" function which would replace the use of mean in this code:
Hi, thank you for your phenomenal work writing and documenting this library.
As I'm sure you're aware, there has been some literature suggesting that an energy statistic that is more robust to outliers can be calculated by taking the median rather than mean when calculating the average distance between samples. See: James, N. A., Kejariwal, A., & Matteson, D. S. (2016). Leveraging cloud data to mitigate user experience from ‘breaking bad.’ 2016 IEEE International Conference on Big Data (Big Data), 3499–3508. https://doi.org/10.1109/BigData.2016.7841013. Specifically section 3a of that article, "Robustness against Anomalies".
From looking at this library, it seems to me that this change would be as simple as allowing a configurable "average" function which would replace the use of
mean
in this code:https://github.com/vnmabus/dcor/blob/e7351553fb277f271ede1bf3e7148b408185707a/dcor/_energy.py#L24-L28
Would you be interested in such an implementation?