scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 604 forks source link

Multibatchnorm / between library batch normalization #3309

Open Marwansha opened 1 month ago

Marwansha commented 1 month ago

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Hi,

Is there an equivalent function to multiBatchNorm in Python, or another method that can perform per-batch normalization?

My goal is to compute psuedobulk per indiviudal, Each individual sample has replicates that are processed across different libraries,

a- Simply summing the raw counts across replicates would likely introduce bias due to library-specific batch effects.

b- Taking the mean of normalized counts across replicates (scranPY normalized counts) doesn’t account for differences in size factors across the libraries, making normalization inconsistent between batches.

important note : replicates are distributed across different libraries

Individual x might have replicate 1 in library 1 and replicate 2 in library 3, while Individual y might have replicate 1 in library 1 but replicate 2 in library 4. so thats why summing raw / normalized counts directly seem inaccurate

I’d greatly appreciate any advice.

In R, I’ve previously used multiBatchNorm from the scran package, which normalizes and scale the size factors within each batch to handle such batch effects. However, given the size of my current dataset, using R is not feasible.