scverse / pertpy

Perturbation Analysis in the scverse ecosystem.
https://pertpy.readthedocs.io/en/latest/
MIT License
92 stars 19 forks source link

Augur results have different dictionary dimensions #534

Closed namsaraeva closed 3 months ago

namsaraeva commented 4 months ago

Report

While running the Augur example code

import pertpy as pt
adata = pt.dt.bhattacherjee()
ag_rfc = pt.tl.Augur("random_forest_classifier")

data_15 = ag_rfc.load(adata, condition_label="Maintenance_Cocaine", treatment_label="withdraw_15d_Cocaine")
adata_15, results_15 = ag_rfc.predict(data_15, random_state=None, n_threads=4)
adata_15_permute, results_15_permute = ag_rfc.predict(data_15, augur_mode="permute", n_subsamples=100, random_state=None, n_threads=4)

data_48 = ag_rfc.load(adata, condition_label="Maintenance_Cocaine", treatment_label="withdraw_48h_Cocaine")
adata_48, results_48 = ag_rfc.predict(data_48, random_state=None, n_threads=4)
adata_48_permute, results_48_permute = ag_rfc.predict(data_48, augur_mode="permute", n_subsamples=100, random_state=None, n_threads=4)
pvals = ag_rfc.predict_differential_prioritization(augur_results1=results_15, augur_results2=results_48, permuted_results1=results_15_permute, permuted_results2=results_48_permute)

I stumbled upon different dimensions of results:

grafik

grafik

For some weird reason, permuted 48h results lose the NF Oligo dictionary.

This needs further investigation.

Version information

No response

namsaraeva commented 3 months ago

I ran the code again and apparently there was a reason for this behaviour. Logging message when running adata_48_permute, results_48_permute = ag_rfc.predict(data_48, augur_mode="permute", n_subsamples=100, random_state=None, n_threads=4) says:

Skipping NF Oligo cell type - 79 samples is less than min_cells 100.

@Zethson do we keep it as it is?

Zethson commented 3 months ago

Then yes