theislab / cpa

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.
BSD 3-Clause "New" or "Revised" License
76 stars 17 forks source link

Predicting using trained model. #27

Open faith-8 opened 9 months ago

faith-8 commented 9 months ago

Hi, I've successfully trained a model from scratch by following the tutorial on the following link https://cpa-tools.readthedocs.io/en/latest/tutorials/combosciplex_Rdkit_embeddings.html

However, I'm currently lost on how to use the trained model in predicting an unseen dataset. I've tried creating the a new anndata with unseen perturbation but the following error occured.

INFO     Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup                             
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[48], line 1
----> 1 model.predict(ood_adata, batch_size=1024)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\torch\autograd\grad_mode.py:27](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/torch/autograd/grad_mode.py:27), in _DecoratorContextManager.__call__..decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\cpa\_model.py:679](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/cpa/_model.py:679), in CPA.predict(self, adata, indices, batch_size, n_samples, return_mean)
    676 assert self.module.recon_loss in ["gauss", "nb", "zinb"]
    677 self.module.eval()
--> 679 adata = self._validate_anndata(adata)
    680 if indices is None:
    681     indices = np.arange(adata.n_obs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\scvi\model\base\_base_model.py:415](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/scvi/model/base/_base_model.py:415), in BaseModelClass._validate_anndata(self, adata, copy_if_view)
    409 if adata_manager is None:
    410     logger.info(
    411         "Input AnnData not setup with scvi-tools. "
    412         + "attempting to transfer AnnData setup"
    413     )
    414     self._register_manager_for_instance(
...
    230     self.attr_key,
    231     categorical_dtype=cat_dtype,
    232 )

ValueError: Category CHEMBL1213492+CHEMBL491473 not found in source registry. Cannot transfer setup without `extend_categories = True`.

Any help would be appreciated.

HelloWorldLTY commented 7 months ago

Hi, same question here. The authors seem to believe that data with known combination but different dosage are OOD data, shown in the default tutorial. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage, and the authors have another tutorial to handle this case.

HelloWorldLTY commented 7 months ago

Just notice that they have a version with drug embeddings database, which would at least allow us to predict the contributions of drugs in this database: https://colab.research.google.com/github/theislab/cpa/blob/master/docs/tutorials/combosciplex_Rdkit_embeddings.ipynb#scrollTo=79062e65-3de9-4916-8999-449ef2df3edf

M0hammadL commented 7 months ago

Hi, you can use these embeddings as an example or any other gene or drug embeddings to generalize to unseen embeddings

M0hammadL commented 7 months ago

Hi, same question here. I think the definition of OOD between the authors and users might be different here. The authors seem to believe that data with known combination but different dosage are OOD data. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage. Therefore, I think CPA does not have the function precisely matched our definition.

I suggest you to read the toturials we have all sorts of scenarios dosage, cell types unseen drugs and combinations and genes etc.

HelloWorldLTY commented 7 months ago

Thanks for your notes, just clarified my words.