Closed asistradition closed 2 months ago
The main advantage to using .obsm
is that column slicing a row-major array requires a copy, and there is a considerable amount of overhead when calling .layers[key][:, filtered_genes]
repeatedly.
As those keys are only used in cooks_distances
it is a considerable optimization (for large data, e.g. 50k x 30k) to move them to .obsm
remove those copies.
Also includes an optional control_genes
argument to fit_size_factors
which has the same behavior as the controlGenes
argument to estimateSizeFactors
and the associated unit test
Thanks @asistradition for this PR!
I agree with @umarteauowkin that on principle storing _mu_LFC
and _hat_diagonals
in the obsm
and not in the layers
is a bit awkward, but if this leads to memory gains I'm fine with it, given that (as you pointed out) they are only used in cooks_distances
.
What does your PR implement? Be specific.
calculate_cooks
castsnormed_counts
into a pandas DataFrame forrobust_method_of_moments_disp
. This is memory inefficient for large data.robust_method_of_moments_disp
has been refactored to accept an ndarray directly and the DataFrame has been removed. There is no numerical change as a result.