RPM normalization may penalize low accumulated smallRNAs

Hi Reza,

I've noticed that in this version of miRador, RPM normalization is applied by default. While this normalization can be useful, it may inadvertently remove low-expressed fragments that hold meaningful biological information. For instance, imagine an input library of approximately 20 million reads being downscaled to 1 million during normalization.

During this process, reads with low counts, such as 5 before normalization, would become 0.25. I'm concerned that the Python implementation may round this value to 0, resulting in the loss of information. This could pose a problem if users intend to follow Blake and Axtell's criteria for miRNA annotation, as the miRNA* is typically lowly expressed and its presence is required for certain analysis packages.

I have two suggestions to address this issue:

(i) Consider rounding any non-zero value (e.g., 0.25) up to 1 to prevent data loss.

(ii) Allow users to define a normalization factor, such as RP30M, while preventing them from inflating or deflating it excessively based on actual library sizes. Additionally, miRador could potentially select the most appropriate normalization factor automatically.

Best regards, Thales

rkweku / miRador

RPM normalization may penalize low accumulated smallRNAs #12