s4hts / HTStream

A high throughput sequence read toolset using a streaming approach facilitated by Linux pipes
https://s4hts.github.io/HTStream/
Apache License 2.0
49 stars 9 forks source link

Remaining adapter sequence #252

Open cdermita opened 1 year ago

cdermita commented 1 year ago

Describe the bug hts_AdapterTrimmer and hts_PolyATTrim did not remove adapters in some reads that have incomplete, degeneracy or extra sequence in the adapter sequence of the 3' end of the reads

To Reproduce Example sequence reads that will help reproduce the bug: @A01488:145:HHFCKDSX5:4:1101:19895:1000_TATA_NTGGCT 1:N:0:TAACCAGCACTT+NATGTCGTTGGA GAGTTTGTGATTTAAACATTTTGTTGTTAATAATATTGATATTGTATTTTCTTGAATGTGGAACTTTCTTTTTTATGCTTACGTACCAAAAAAAAAAAAAAAAAAAACGGAATAGCAAACGTCTTAAAACCAGTCAAAAA

@A01488:145:HHFCKDSX5:4:1101:25247:1000_TATA_NTGGGG 1:N:0:TAACCAGCACTT+NATGTCGTTGGA TTGAGATGGGTGTTCCAAGAGTCGAATAGCTTGGGAATGCTGTTCTAAATGGGTGGTAAATTTCATCTAAAGCTAAATATCGACGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAAAAAAAAAAAAAAAAAGATCG

@A01488:145:HHFCKDSX5:4:1101:7139:1016_TATA_TGAGTT 1:N:0:TAACCAGCACTT+AATGTCGTTGGA TTGCTTTCATCATCCCTTTTACAGGGTGAAATTAATTGTTACTTTCAACAGATGCTTCTGATTAAAAAAAAAAAAAAAAAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAACCAGCACTTATCTCGTTTGCCGGG

Commands to reproduce the behavior: module load miniconda source activate process_reads

for x in ${umi_folder[@]}; do y=${x##/} echo Working on $y hts_Stats -F -L ${log_folder}${y}.log -U ${x} | #read stats hts_SeqScreener -AL ${log_folder}${y}.log | #remove contaminants hts_AdapterTrimmer -AL ${log_folder}${y}.log | #trim adaptors hts_PolyATTrim -AL ${log_folder}${y}.log | hts_QWindowTrim -AL ${log_folder}${y}.log | #remove low-quality hts_NTrimmer -AL ${log_folder}${y}.log | #remove Ns hts_Stats -AL ${log_folder}${y}.log -f ${clean_folder}${y} #read stats and save cleaned done

conda deactivate module unload miniconda

Expected behavior Since the Lexogen 3' end which consists of 5' – A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC – 3' (adepter sequence same as default TruSeq), we expect that the sequence of PolyA tail and adapetr sequence will be trimmed.

Screenshots

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

Screen Shot 2023-03-27 at 10 27 52 AM