theislab / cpa

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.
BSD 3-Clause "New" or "Revised" License
76 stars 17 forks source link

Generalization to unseen categories #42

Open rvinas opened 4 months ago

rvinas commented 4 months ago

In the context transfer tutorial (predicting perturbation responses for unseen cell-types), the train set contains the OOD cell-type (B cell):

OOD split

(adata[adata.obs['split_B'] == 'ood'].obs['cell_type'].values == 'B').sum()
# Prints 774

Train split

(adata[adata.obs['split_B'] == 'train'].obs['cell_type'].values == 'B').sum()
# Prints 543

I am interested in the scenario where certain conditions are not available at train time. When I do inference on unseen conditions using a trained CPA model, I get the following error:

ValueError: Category CATEGORY_NAME not found in source registry. Cannot transfer setup without `extend_categories = True`.

How can I set up CPA to generalize to unseen categories?