nerettilab / RepEnrich2

RepEnrich2 is an updated method to estimate repetitive element enrichment using high-throughput sequencing data.
36 stars 9 forks source link

Question about normalisation #11

Closed re2srm closed 5 years ago

re2srm commented 5 years ago

Hello,

Just needed a small clarification regarding normalising the counts generated by repenrich. The tutorial mentions that library size should calculated as reads processed - reads that failed to align using the bowtie log. I am a little confused about how to calculate this. Here is the bowtie result for one of my sample:

26562723 reads; of these: 26562723 (100.00%) were paired; of these: 7376294 (27.77%) aligned concordantly 0 times 7786417 (29.31%) aligned concordantly exactly 1 time 11400012 (42.92%) aligned concordantly >1 times

7376294 pairs aligned concordantly 0 times; of these:
  1005276 (13.63%) aligned discordantly 1 time
----
6371018 pairs aligned 0 times concordantly or discordantly; of these:
  12742036 mates make up the pairs; of these:
    7152611 (56.13%) aligned 0 times
    2669705 (20.95%) aligned exactly 1 time
    2919720 (22.91%) aligned >1 times

86.54% overall alignment rate

Since the total alignment rate is 86.54% I am assuming I would use 0.8654*26562723=22987380 as the library size. Am I correct?

Thanks

nskvir commented 5 years ago

Hi there,

Thanks for your interest in our software! Yes that should be correct for the library size - I also wanted to comment on your previous post before you closed it, but you should probably run each file with a separate output destination if you are running jobs in parallel because (as you said) the outputs might conflict in some way within in the pair_1 and pair_2 folders.

Best, Nick

re2srm commented 5 years ago

Great. Thanks a lot.