Open AxKo opened 1 year ago
Thank you for your interest in CPA. About the first question, the notebook will be updated soon to contain more meaningful and curated splits for combinatorial perturbations. I'll update as soon as possible here.
As for your second question, the model.predict() method, takes the perturbations and dosages from the perturbation_key
and dosage_key
columns of your input adata
and applies those perturbations to the basal latent obtained from each cell.
So in the tutorial example you mentioned:
cond_harm
column of the data and adds those perturbations in the predict method to the output.
cpa.CPA.setup_anndata(adata,
perturbation_key='cond_harm',
control_group='ctrl',
dosage_key='dose_value',
categorical_covariate_keys=['cell_type'],
is_count_data=True,
deg_uns_key='rank_genes_groups_cov',
deg_uns_cat_key='cov_cond',
max_comb_len=2,
)
So if you'd like to predict a specific perturbation for a given cell, you can change the perturbation or dosage in the mentioned columns of your adata
.
Feel free to reply if there are further issues.
Ah, okay, thanks for that information.
But the cond_harm column takes a single value and not a list, which means that I can only apply a single perturbation to the basal latent representation. Is that correct ?
And the content of dosage_key are strings like '1.0+1.0' (and not float values). Then, how can I specify a new value (e.g. 1.5) in a way that CPA understands it?
Thanks
You can apply combinations of perturbations. CPA uses strings with the following format for specifying perturbations and dosage values in the adata:
cond_harm
column:
"PERT1"
--> A single perturbation (e.g. "SGK1"
)"PERT1+PERT2"
--> Combination of perturbations PERT1 and PERT2 (e.g. "FOXL2+HOXB9"
)+
character as the split between different perturbations and CPA will understand them.dosage
column. The dosages are given to the model as strings of the following format:
"1.0"
--> Dosage 1.0 when we have one perturbation. (e.g. "1.5"
or any other number)"1.0+1.5"
--> Dosages 1.0 and 1.5 for PERT1 and PERT2 respectively.+
character and converts the string numbers to floats ("1.0+1.5" --> [1.0, 1.5]
)It is actually done in the setup_anndata
method of the model:
https://github.com/theislab/cpa/blob/c63d5cf5cfc70c410ca9d95fb3b92fc71018c6f1/cpa/_model.py#L294-L317
As you can see in the code, setup_anndata
creates lists of perturbation ids and respective dosages from the given strings in the perturbation and dosage columns of adata.obs
and saves them in adata.obsm
and uses this as the input data to the model, for example:
adata
after running setup_anndata
you will see the following obsm values:
obsm: 'X_pca', 'X_umap', 'perts', 'perts_doses', 'deg_mask', 'deg_mask_r2'
Here perts
is the list of perturbation IDs which is used to retrieve perturbation embeddings from the PerturbationNetwork
and pert_doses
is the respective dosages.
I hope this helps and again, free to reply if there are further issues.
Very good, that's what I was looking for !
Actually, I only now looked at your "Batch Correction in Expression Space" tutorial with the description of custom_predict( ) and how to use it. That is obviously the function I need !
Many thanks
I am sorry, but I have to reopen this :-(
Looking at custom_predict I see that it allows me to select individual categorical covariates that I want to add, but it only allows me to add all or none perturbations. So that means if I want to add individual perturbations, I have to follow your advice from above !?
I think, I'm also confused what the difference is between perturbations and categorical covariates. I thought perturbations would be continuous variables, but in many of the tutorials the perturbation comes in form of discrete values (IFN stimulation or not, gene knockout or not, etc). Does that mean these tutorials could have been written differently by declaring those 'perturbations' as categorical covariates ??
Thanks
I got CPA 0.8.2 and followed the tutorial "Predicting single-cell response to unseen combinatorial CRISPR perturbations". The goal is to predict gene expression response to perturbation responses of X+Y when you have seen single cells from X and Y. I can reproduce all the results from the tutorial, but I have difficulties to understand some points :-(
1) adata.obs['split'] is filled randomly with 'train', 'valid' and 'test' values and then used for training the CPA model. But if the goal is to predict the effect of perturbations X+Y when I have only seen perturbations X and Y separately, then I should not provide X+Y in the training data !? So, is the construction of adata.obs['split'] correct ??
2) I thought the whole point of CPA is to disentangle the effects of different perturbations in such a way that I can later apply such perturbations in different combinations. However, the model.predict() method that is used in the tutorials does not take any parameters to indicate which perturbations should be predicted. How does CPA know which perturbations to apply? And how can I specify that?
It seems I'm missing here something important and I'm grateful for any help!