scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.88k stars 594 forks source link

Set n_pcs when use_rep is specified in sc.pp.neighbors #1281

Open rbpatt2019 opened 4 years ago

rbpatt2019 commented 4 years ago

Sorry to file two issues in a row! I've been using Scanpy + scVI for my analysis recently, and am really enjoying it!

Currently, if use_rep is set in sc.pp.neighbors, then n_pcs is ignored. These seems like sane default behaviour to me - if we aren't using 'X_pca', then n_pcs doesn't really make sense.

This can actually be limiting, as I recently discovered. If you calculate a latent representation with scVI with n_latent = 50, you can't then do...

sc.pp.neighbors(adata, use_rep='X_scvi', n_pcs = 25)

...as the neighborhood graph is calculated on all 50 latent variables. If I want to use only 25 - say, as a point of comparison to see which best represents my data - I have to recalculate the latent representation with scVI with n_latent = 25. Basically, if you aren't using PCA, you have to calculate a full reduction for every number of dimensions you are interested in.

I don't think the fix would be that major. The source seems to be in the _choose_representation function, with the directly relevant snippet below:

https://github.com/theislab/scanpy/blob/5bc37a2b10f40463f1d90ea1d61dc599bbea2cd0/scanpy/tools/_utils.py#L48-L58

I think the only change needed would be to have the n_pcs is not None catch in all cases, not just when use_rep = 'X_pca'. Might also generalise the variable name to make it more clear it refers to any reduction, not just PCA. Admittedly, I only just started exploring the code base, so I'm not sure where else _choose_representation is called, or what other impacts this could have.

I have some blue sky time tomorrow, so I'll fork it then and see if I can whip up a pull request. Thanks for the excellent product!

rbpatt2019 commented 4 years ago

And I just saw #1267, which mentioned the subsetting option and totally solves the problem. If you all are interested in having this functionality as an option, leave this open, let me know, and I'll work on a pull request. Otherwise, feel free to close it. Sorry for the (sort of) repeat.