Logistic regression support for the Discriminator Classifier

Lilly-May commented 3 months ago

PR Checklist

[X] Referenced issue is linked (closes #538)
[X] If you've fixed a bug or added code that should be tested, add tests!
[X] Documentation in docs is updated

Description of changes

As an alternative to a multi-layer perceptron (MLP), the discriminator classifier now also supports logistic regression for embedding creation
The model (MLP or regression) is defined when creating the DiscriminatorClassifierSpace object. By default, it's set to MLP, ensuring backward compatibility with the previous usage
I decreased the number of epochs for the MLP test from 5 to 2. As a result, computing time decreases from 1min 15sec to 43sec on my local machine. 2 epochs are still enough for the MLP to learn how to separate the classes.
Added and restructured the tests for DiscriminatorClassifierSpace: The adata is now provided via a fixture and is subsequently used by both the MLP and the regression classifier testing methods

Technical details

I tested the regression classifier implementation using the Norman dataset using the following code:

sc.pp.pca(adata, n_comps=30)
ps = pt.tl.DiscriminatorClassifierSpace("regression")
classifier_ps = ps.load(adata, embedding_key="X_pca", target_col="perturbation_name")
classifier_ps.train()
pert_embeddings = classifier_ps.get_embeddings()

sc.pp.neighbors(pert_embeddings, use_rep='X')
sc.tl.umap(pert_embeddings)
sc.pl.umap(pert_embeddings, color=['gene_programme'])

Which results in the following UMAP: umap

Lilly-May commented 3 months ago

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 83.33333% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 63.52%. Comparing base (916c837) to head (7fd7bbe). Report is 2 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #560 +/- ## ========================================== + Coverage 63.40% 63.52% +0.12% ========================================== Files 43 43 Lines 5052 5091 +39 ========================================== + Hits 3203 3234 +31 - Misses 1849 1857 +8 ``` | [Files](https://app.codecov.io/gh/theislab/pertpy/pull/560?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab) | Coverage Δ | | |---|---|---| | [pertpy/tools/\_\_init\_\_.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [pertpy/tools/\_distances/\_distances.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19kaXN0YW5jZXMvX2Rpc3RhbmNlcy5weQ==) | `88.60% <ø> (ø)` | | | [pertpy/tools/\_perturbation\_space/\_simple.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19wZXJ0dXJiYXRpb25fc3BhY2UvX3NpbXBsZS5weQ==) | `75.89% <100.00%> (ø)` | | | [.../\_perturbation\_space/\_discriminator\_classifiers.py](https://app.codecov.io/gh/theislab/pertpy/pull/560?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab#diff-cGVydHB5L3Rvb2xzL19wZXJ0dXJiYXRpb25fc3BhY2UvX2Rpc2NyaW1pbmF0b3JfY2xhc3NpZmllcnMucHk=) | `90.64% <82.60%> (ø)` | | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/theislab/pertpy/pull/560/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=theislab)

Zethson commented 3 months ago

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

Yes, please! Also thought about that while reviewing.

Lilly-May commented 3 months ago

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

If we want to stick with the DiscriminatorClassifierSpace class, I would keep the implementation as it is in this PR, with regression as a parameter choice. The downside of this is that the implementations are quite different (in each method, I have an if-else statement which separates between MLP or regression), but that isn't really of interest to the user. So, the fact that several parameters are not applicable for the respective model might be the bigger problem.

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Zethson commented 3 months ago

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Concerning backwards compatibility: We could alias the classes. A simple DiscriminatorClassifierSpace = MLPClassifierSpace in an __init__.py would probably do the trick.

I think that if we really wanted to we could probably provide a somewhat sane load and get_embeddings also for the RegressionClassifierSpace, but they'd be super simple, right? I'll leave this up to you to judge and we'll roll with whatever you think is best.

But yeah, splitting this into two is I think the better approach

scverse / pertpy

Logistic regression support for the Discriminator Classifier #560

Codecov Report