kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

More info from fastp stderr #60

Closed kfuku52 closed 1 year ago

kfuku52 commented 3 years ago

Q20 bases, Q30 bases, Duplication rate, Insert size peak could be added to updated metadata, so that they can be correlated with SVs in curate.

fastp stdout:

fastp stderr:
Read1 before filtering:
total reads: 11235575
total bases: 1082843111
Q20 bases: 1059537782(97.8478%)
Q30 bases: 979042729(90.4141%)

Read2 before filtering:
total reads: 11235575
total bases: 1075843409
Q20 bases: 1047833817(97.3965%)
Q30 bases: 962247102(89.4412%)

Read1 after filtering:
total reads: 11235573
total bases: 1079334090
Q20 bases: 1056101795(97.8475%)
Q30 bases: 976065479(90.4322%)

Read2 aftering filtering:
total reads: 11235573
total bases: 1071402731
Q20 bases: 1043529853(97.3985%)
Q30 bases: 958691984(89.4801%)

Filtering result:
reads passed filter: 22471146
reads failed due to low quality: 4
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 818312
bases trimmed due to adapters: 7949513

Duplication rate: 2.09054%

Insert size peak (evaluated by paired-end reads): 142
kfuku52 commented 1 year ago

This is a bit complicated with the current implementation of getfastq's two-round fastq extractions. This is not a high-priority enhancement, and I am closing the issue for now.