scverse / pertpy

Perturbation Analysis in the scverse ecosystem.
https://pertpy.readthedocs.io/en/latest/
MIT License
92 stars 19 forks source link

Logistic regression support for the Discriminator Classifier #560

Closed Lilly-May closed 3 months ago

Lilly-May commented 3 months ago

PR Checklist

Description of changes

Technical details

I tested the regression classifier implementation using the Norman dataset using the following code:

sc.pp.pca(adata, n_comps=30)
ps = pt.tl.DiscriminatorClassifierSpace("regression")
classifier_ps = ps.load(adata, embedding_key="X_pca", target_col="perturbation_name")
classifier_ps.train()
pert_embeddings = classifier_ps.get_embeddings()

sc.pp.neighbors(pert_embeddings, use_rep='X')
sc.tl.umap(pert_embeddings)
sc.pl.umap(pert_embeddings, color=['gene_programme'])

Which results in the following UMAP: umap

Lilly-May commented 3 months ago

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 83.33333% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 63.52%. Comparing base (916c837) to head (7fd7bbe). Report is 2 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #560 +/- ## ========================================== + Coverage 63.40% 63.52% +0.12% ========================================== Files 43 43 Lines 5052 5091 +39 ========================================== + Hits 3203 3234 +31 - Misses 1849 1857 +8 ``` | [Files](https://app.codecov.io/gh/theislab/pertpy/pull/560?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab) | Coverage Δ | | |---|---|---| | [pertpy/tools/\_\_init\_\_.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [pertpy/tools/\_distances/\_distances.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19kaXN0YW5jZXMvX2Rpc3RhbmNlcy5weQ==) | `88.60% <ø> (ø)` | | | [pertpy/tools/\_perturbation\_space/\_simple.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19wZXJ0dXJiYXRpb25fc3BhY2UvX3NpbXBsZS5weQ==) | `75.89% <100.00%> (ø)` | | | [.../\_perturbation\_space/\_discriminator\_classifiers.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19wZXJ0dXJiYXRpb25fc3BhY2UvX2Rpc2NyaW1pbmF0b3JfY2xhc3NpZmllcnMucHk=) | `90.64% <82.60%> (ø)` | | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/theislab/pertpy/pull/560/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab)
Zethson commented 3 months ago

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

Yes, please! Also thought about that while reviewing.

Lilly-May commented 3 months ago

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

If we want to stick with the DiscriminatorClassifierSpace class, I would keep the implementation as it is in this PR, with regression as a parameter choice. The downside of this is that the implementations are quite different (in each method, I have an if-else statement which separates between MLP or regression), but that isn't really of interest to the user. So, the fact that several parameters are not applicable for the respective model might be the bigger problem.

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Zethson commented 3 months ago

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Concerning backwards compatibility: We could alias the classes. A simple DiscriminatorClassifierSpace = MLPClassifierSpace in an __init__.py would probably do the trick.

I think that if we really wanted to we could probably provide a somewhat sane load and get_embeddings also for the RegressionClassifierSpace, but they'd be super simple, right? I'll leave this up to you to judge and we'll roll with whatever you think is best.

But yeah, splitting this into two is I think the better approach