theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
88 stars 23 forks source link

is de_genes keyed on condition only or cell_drug_dose_comb #144

Open bhomass opened 9 months ago

bhomass commented 9 months ago

in train.py, it says de_genes is keyed on cell_drug_dose_comb

        # genes for every cell_drug_dose combination.
        bool_de = dataset.var_names.isin(
            np.array(dataset.de_genes[cell_drug_dose_comb])
        )

But, in lincs.py, de_genes is keyed on eval_category adata.uns['rank_genes_groups_cov'] = {cat: de_genes_quick[extract_drug(cat)] for cat in adata.obs.eval_category.unique() if extract_drug(cat) != 'DMSO'}

where eval_category is adata.obs["eval_category"] = adata.obs["cov_drug_name"] so, no dosage included

checking lincs_full_smiles_sciplex_genes.h5ad though, indeed it is keyed on cell-line_drug.

which convention did you ultimately choose? without code modification in train.py, the code crashes.

bhomass commented 9 months ago

when it comes to sciplex_complete_middle_subset_lincs_genes.h5ad, we are back to cell-line_drug_dose again.

There has to be a set convention, or else the code in train.py can not work for all datasets.