Open J-81 opened 1 year ago
Potential route using within trim_galore adaptor auto-detection: https://github.com/FelixKrueger/TrimGalore/blob/0.6.7/Docs/Trim_Galore_User_Guide.md#adapter-auto-detection
I'll try using auto-detect by omitting the flag, will of course validate if the auto detect is consistent with direct user supply of the parameter.
Testing Results using GLDS-426_Truncated (Known to have Nextera adapters):
Input filename: EU236_R2_raw.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.7
Cutadapt version: 3.7
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed
This is cutadapt 3.7 with Python 3.9.6
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC EU236_R2_raw.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.09 s (301 µs/read; 0.20 M reads/minute).
=== Summary ===
Total reads processed: 300
Reads with adapters: 118 (39.3%)
Reads written (passing filters): 300 (100.0%)
Total basepairs processed: 45,000 bp
Quality-trimmed: 2,140 bp (4.8%)
Total written (filtered): 42,715 bp (94.9%)
Input filename: EU236_R2_raw.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.7
Cutadapt version: 3.7
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'CTGTCTCTTATA' (Nextera Transposase sequence; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed
This is cutadapt 3.7 with Python 3.9.6
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a CTGTCTCTTATA EU236_R2_raw.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.09 s (297 µs/read; 0.20 M reads/minute).
=== Summary ===
Total reads processed: 300
Reads with adapters: 233 (77.7%)
Reads written (passing filters): 300 (100.0%)
Total basepairs processed: 45,000 bp
Quality-trimmed: 2,140 bp (4.8%)
Total written (filtered): 33,046 bp (73.4%)
Input filename: EU236_R2_raw.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.7
Cutadapt version: 3.7
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Using Nextera adapter for trimming (count: 113). Second best hit was smallRNA (count: 16)
Adapter sequence: 'CTGTCTCTTATA' (Nextera Transposase sequence; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed
This is cutadapt 3.7 with Python 3.9.6
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a CTGTCTCTTATA EU236_R2_raw.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.09 s (310 µs/read; 0.19 M reads/minute).
=== Summary ===
Total reads processed: 300
Reads with adapters: 233 (77.7%)
Reads written (passing filters): 300 (100.0%)
Total basepairs processed: 45,000 bp
Quality-trimmed: 2,140 bp (4.8%)
Total written (filtered): 33,046 bp (73.4%)
Currently workflow user is expected to replace this value manually in workflow module file. Instead, the adaptor should be automatically determine, perhaps from the raw fastQC reports/multiQC and supplied to the trimming processing.
DPPD Reference
https://github.com/nasa/GeneLab_Data_Processing/blob/0fe1dfd46ee662a333ac49e6013dbd82f86cb987/RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md?plain=1#L207
Workflow Reference
https://github.com/nasa/GeneLab_Data_Processing/blob/0fe1dfd46ee662a333ac49e6013dbd82f86cb987/RNAseq/Workflow_Documentation/NF_RCP-F/workflow_code/modules/quality.nf#L73-L76