Open rbpatt2019 opened 4 years ago
And I just saw #1267, which mentioned the subsetting option and totally solves the problem. If you all are interested in having this functionality as an option, leave this open, let me know, and I'll work on a pull request. Otherwise, feel free to close it. Sorry for the (sort of) repeat.
Sorry to file two issues in a row! I've been using Scanpy + scVI for my analysis recently, and am really enjoying it!
Currently, if use_rep is set in
sc.pp.neighbors
, then n_pcs is ignored. These seems like sane default behaviour to me - if we aren't using 'X_pca', then n_pcs doesn't really make sense.This can actually be limiting, as I recently discovered. If you calculate a latent representation with scVI with
n_latent = 50
, you can't then do......as the neighborhood graph is calculated on all 50 latent variables. If I want to use only 25 - say, as a point of comparison to see which best represents my data - I have to recalculate the latent representation with scVI with
n_latent = 25
. Basically, if you aren't using PCA, you have to calculate a full reduction for every number of dimensions you are interested in.I don't think the fix would be that major. The source seems to be in the
_choose_representation
function, with the directly relevant snippet below:https://github.com/theislab/scanpy/blob/5bc37a2b10f40463f1d90ea1d61dc599bbea2cd0/scanpy/tools/_utils.py#L48-L58
I think the only change needed would be to have the
n_pcs is not None
catch in all cases, not just whenuse_rep = 'X_pca'
. Might also generalise the variable name to make it more clear it refers to any reduction, not just PCA. Admittedly, I only just started exploring the code base, so I'm not sure where else_choose_representation
is called, or what other impacts this could have.I have some blue sky time tomorrow, so I'll fork it then and see if I can whip up a pull request. Thanks for the excellent product!