theislab / scib

Benchmarking analysis of data integration tools
MIT License
283 stars 62 forks source link

Metrics fail when there are unused labels #379

Open lazappi opened 1 year ago

lazappi commented 1 year ago

Some of the metrics fail if adata.obs[label_key] is categorical and contains categories that aren't used. Can probably be fixed fairly easily by adding a adata.obs[label_key] = adata.obs[label_key].cat.remove_unused_categories(), changing how the present labels are calculated or doing something else to avoid this situation.

florianingelfinger commented 6 months ago

The same happens also if one of the labels is not present in one of the batches. Would be helpful to return a user error with description of the problem or subsetting as proposed by @lazappi