usadellab / Trimmomatic

Other
217 stars 70 forks source link

Negative Value of "Input Read Pairs" and "Both Surviving" in log file #44

Open lxwgcool opened 1 year ago

lxwgcool commented 1 year ago

Hi,

We are using Trimmomatic as a benchmark tool in our pipeline for the trimming of Illumina reads.

We recently found that the metrics of "Input Read Pairs" and "Both Surviving" reported by Trimmomatic log file are negative in one of our flowcells (others are good).

Please check the screenshot below (line 9): image

The reads we used are human whole genome sequence reads. The size of each reads is around 160GB.

Based on our previous work, these two metrics should be always positive. I have several questions below: (1) May I know why we get the negative value for these two metrics? (2) If it is not a bug, how we understand these negative value? (3) How we get the real number of "Input Read Pairs" and "Both Surviving"? Should we simply reverse the negative to positive?

Thanks so much for your help. Best regards Xin

TonyBolger commented 1 year ago

This looks like an 32-bit integer wrap - i guess you have over 2bn read pairs? The real number is (2^32) added to the numbers shown there.

Input: (2^32) + -1953136673 => 2341830623 Both Surviving: (2^32) + -2013735491 => 2281231805

I hadn't really considered the possibility of >2bn reads in a dataset 10 years ago :)

lxwgcool commented 1 year ago

Thank you so much for your prompt reply Tony. I think it does make sense.

We are current using Illumina HiSeq platform and generated a lot of sequencing data for different WGS projects. I believe with the development of technology and the strong financial support, we may generate more sequencing data that contain more than 2 billion reads-pair for a single subject.

Thanks for your solution. I will keep using the rule (2^32 + "the negative number") to convert the negative value to the real number of reads-pair in our pipeline. However, with more these big size data generated, I think it may be more convenient for us if you can update the trimmomatic source code and release a new version of trimmomatic to use int 64 to solve this issue.

Thank you so much for your help Best Xin