Open giovp opened 3 years ago
I think that the two metrics that @vitkl suggested are nice to include:
We also discussed including JSD (Jensen Shannon Divergence) which treats the cell type proportions more as a distribution across the different spots, it's also preferable to the KLD since it's lower and upper bounded with zero respectively one (when using base 2 in the logarithm) which is not true for KLD. However, thinking more about this I think that the JSD could cause some issues if a cell type has zero estimated probability and true probability of being in a spot (would cause a zero division), see link. Would propose to use either of.
For these two metrics there would potentially be two alternative approaches:
I think it is important for metrics to be easily interpretable by the users.
PR macro-average across cell types represents the accuracy/sensitivity at detecting cell abundance > 0, with PR curves averaged across cell types. It is not great for cell types that are expected to be absent in all locations because the PR curve cannot be computed.
R^2 - represents the consistency of estimated and ground-truth cell proportion, + many people are used to looking at scatterplots and R^2.
@almaan can you explain the metrics you proposed in 'plain English'?
Fully agree,
Bhattacharyya coefficient - measures the amount of overlap or similarity between two distributions, by integration or summation over the probability space. Here the distributions we are looking at would either be how a cell type is distributed across all spots (comparing true vs predicted), or how the cell types are distributed within each spot, then computing the average of these coefficients as per (2). The interpretation of the latter would be the average similarity between true and predicted cell type proportions in each spot.
Hellinger distance - very similar to the Bhattacharyya coefficient but forms a proper distance metric, also looks as distance between distributions.
Still, to me R^2 is a dead given, hence why I implemented it in the latest PR.
@hiraksarkar @almaan @vitkl here to discuss which metrics and the aggregation strategy