marcelTBI / GenomeScreen

Scripts and data needed to run GenomeScreen
Other
4 stars 0 forks source link

Normalization #2

Closed liza-alpinia closed 1 year ago

liza-alpinia commented 2 years ago

Good day!

I studied your article and again there was a question about data preprocessing - I did not understand the described normalization stage - 'the bin counts corresponding to autosomal chromosomes for each sample were normalized to the identical number of reads (i.e., each bin was divided so the sum of all bins on autosomal chromosomes would be the same for each sample)' If we split the reference into bins, then there will be the same number of bins for each of the samples. Then the question arises, what does the described stage do.

marcelTBI commented 2 years ago

Hi,

yeah, the wording is a bit "rough", maybe the better would be something along the lines: 'the bin counts corresponding to autosomal chromosomes for each sample were normalized to the identical number of reads (i.e., each bin count was divided so the sum of all bin counts on autosomal chromosomes would be the same for each sample)' - for example, if a sample had 2,000,000 reads and some bin had 7 reads in it, after normalizing this sample to 1,000,000 reads, this bin would have 3.5 reads (note that we can go from integers to floats here, but that is the case also for other types of normalization).

Is this clear now?

liza-alpinia commented 2 years ago

Hi! Thank you for your quick reply. Now it became clear.