Closed alitinet closed 3 years ago
I get the same problem of negative values.
I love how you're fixing sklearn
code @alitinet! Do you want to submit a PR here or to sklearn? It seems like this may be better fixed upstream, no?
In general the ARI can be negative (not something we initially considered in the rescaling though). This happens if the clustering is worse than via random permutations. Typically that negative number should be very near 0 though, as we're not specifically looking for alternative clusterings here... unless the integration method produces an output that has switched batch and cell type or sth like that.
Negative values are possible with ARI per definition. However, together with the runtime warning, the resulting ARI is likely wrong. I was getting ARI larger than 1 (which should never happen).
Someone has already reported this issue to scikit-learn a couple of months ago and it seems to have been fixed but it's not in any available package version from what I could see.
I reimplemented the metric (didn't see your fix when I did) and get proper values now. Still need to decide how to integrate it though, as I'm in the middle of refactoring the metrics module.
This issue should be fixed with #263
Hi all,
I encountered the issue of integer overflow several times now when calculating ARI for big datasets (>150k cells). The error message is
and then the resulting score is a negative number, sometimes a pretty big negative number.
Casting the output of the confusion matrix to floats solves the issue, i.e. instead of returning https://github.com/theislab/scib/blob/94b3103d445e7f7937d99c4ff2b448a9e47f94ff/scIB/metrics.py#L309 one might do the following instead (adjusted from ARI source code) :