nf-core / nascent

Nascent Transcription Processing Pipeline
https://nf-co.re/nascent
MIT License
18 stars 10 forks source link

Investigate adapter trimming #71

Closed edmundmiller closed 2 years ago

edmundmiller commented 2 years ago

Description of the bug

I haven't seen it done, so not sure if it would be beneficial or not.

Command used and terminal output

No response

Relevant files

No response

System information

No response

edmundmiller commented 2 years ago

PINTS does it in the case study

fastp -i ENCFF028THC.fastq.gz \
    --adapter_sequence=TGGAATTCTCGGGTGCCAAGG \
    -o ENCFF028THC.trimmed.fastq.gz \
    -l 14 \   # only keep reads longer than 14nts after trimming
    # This library was polyadenylated, 
    # so we are trimming the last 20nts per reads (with --trim_tail1). 
    # For more recent single-end PRO/GRO-cap libraries, this may not be necessary. 
    --trim_tail1 20 \
    --low_complexity_filter \
    -w 8   # use 8 threads
edmundmiller commented 2 years ago

Trimming definitely!

We didn't dedupe on our GROseq data, because the library was made 10 years ago and there's no UMI, so they didn't know if those were PCR duplicates or not. I'll add this in with the option to skip.