rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
322 stars 123 forks source link

Number of reads increased after Porechop #97

Open sumitra20 opened 8 months ago

sumitra20 commented 8 months ago

Hi,

I have been using porechop to trim my nanopore generated raw reads and i noticed that after trimming the number of total bases reduces, but the number of total reads increases compared to the raw reads. The log file from porechop does indicate that adapter trimming has been done. Seems a little strange to me, shouldn't the number of trimmed reads reduce after porechop? Any advice will be appreciated. Thank you

porechop -i nanopore_strato_barcode2_TEST.fastq.gz -o ./porechop_strato_barcode2_TEST.fastq.gz

OUTPUT:

Looking for known adapter sets 10,000 / 10,000 (100.0%) Best
read Best
start read end Set %ID %ID
SQK-NSK007 100.0 79.2 Rapid 68.4 0.0 RBK004_upstream 80.0 0.0 SQK-MAP006 77.4 82.6 SQK-MAP006 short 80.0 76.0 PCR adapters 1 79.2 79.2 PCR adapters 2 82.6 82.6 PCR adapters 3 78.3 80.0 1D^2 part 1 72.4 74.1 1D^2 part 2 84.8 74.2 cDNA SSP 70.0 73.2 Barcode 1 (reverse) 100.0 80.0 Barcode 2 (reverse) 100.0 100.0 Barcode 3 (reverse) 75.0 77.8 Barcode 4 (reverse) 83.3 80.8 Barcode 5 (reverse) 80.8 80.8 Barcode 6 (reverse) 77.8 84.0 Barcode 7 (reverse) 76.9 76.0 Barcode 8 (reverse) 81.5 76.9 ..

Trimming adapters from read ends SQK-NSK007_Y_Top: AATGTACTTCGTTCAGTTACGTATTGCT SQK-NSK007_Y_Bottom: GCAATACGTAACTGAACGAAGT BC01_rev: CACAAAGACACCGACAACTTTCTT BC01: AAGAAAGTTGTCGGTGTCTTTGTG BC02_rev: ACAGACGACTACAAACGGAATCGA BC02: TCGATTCCGTTTGTAGTCGTCTGT NB01_start: AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAACACAAAGACACCGACAACTTTCTTCAGCACCT NB01_end: AGGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCTTAGCAATACGTAACTGAACGAAGT NB02_start: AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAAACAGACGACTACAAACGGAATCGACAGCACCT NB02_end: AGGTGCTGTCGATTCCGTTTGTAGTCGTCTGTTTAACCTTAGCAATACGTAACTGAACGAAGT

277,493 / 277,493 (100.0%)

270,816 / 277,493 reads had adapters trimmed from their start (20,279,939 bp removed) 234,960 / 277,493 reads had adapters trimmed from their end (12,093,349 bp removed)

Splitting reads containing middle adapters 277,493 / 277,493 (100.0%)

655 / 277,493 reads were split based on middle adapters

Saving trimmed reads to file pigz found - using it to compress instead of gzip

RAW DATA: General summary:
Mean read length: 5,901.0 Mean read quality: 10.1 Median read length: 4,650.0 Median read quality: 10.8 Number of reads: 277,493.0 Read length N50: 7,197.0 STDEV read length: 4,579.3 Total bases: 1,637,482,075.0

TRIMMED DATA: General summary:
Mean read length: 5,774.1 Mean read quality: 10.2 Median read length: 4,529.0 Median read quality: 10.9 Number of reads: 277,956.0 Read length N50: 7,128.0 STDEV read length: 4,555.5 Total bases: 1,604,957,251.0

ombystoma-young commented 5 months ago

Hello, @sumitra20, I suppose, it happened due to splitting reads containing middle adapters. In your case:

655 / 277,493 reads were split based on middle adapters

See Split reads with internal adapters and Discard reads with internal adapters for more detailed description.