Description
Question on output of CLR normalization of protein data. When using the muon.prot.pp.clr() function to apply a CLR transformation on our protein counts we observe a weird result when plotting the counts of the proteins where the output shows some bands of discrete for low values. See image below for raw data and data normalized using CLR function of muon.
To Reproduce
Analysis was run on a subset of the data due to the size of the original dataset. But the data is a 10X CITEseq dataset with 137 proteins.
from muon import prot as pt
# Check the total number of observations
n_obs = mdata['prot'].n_obs
# Determine the size of the subsample
subsample_size = 100000
# Randomly select the observations
np.random.seed(123)
sample_indices = np.random.choice(n_obs, subsample_size, replace=False)
# Create the subsample
subsample = mdata['prot'][sample_indices, :].copy()
normalized_counts = pt.pp.clr(subsample, inplace=False)
subsample.layers['clr_dev'] = normalized_counts.X
Expected behaviour
Normally after a log transformation you would expect continuous data and not as observed here some discrete values in the lower range. Could this be due to 0 values not being handled correctly?
Description Question on output of CLR normalization of protein data. When using the
muon.prot.pp.clr()
function to apply a CLR transformation on our protein counts we observe a weird result when plotting the counts of the proteins where the output shows some bands of discrete for low values. See image below for raw data and data normalized using CLR function of muon.To Reproduce Analysis was run on a subset of the data due to the size of the original dataset. But the data is a 10X CITEseq dataset with 137 proteins.
Expected behaviour Normally after a log transformation you would expect continuous data and not as observed here some discrete values in the lower range. Could this be due to 0 values not being handled correctly?
System
Additional context https://github.com/scverse/muon/blob/94917d23291f329a19b3c282276c960d414319ad/muon/_prot/preproc.py#L201-L240