junchaoshi / sports1.1

Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs
GNU General Public License v3.0
45 stars 16 forks source link

Processing report #12

Closed xiaoyunguo closed 3 years ago

xiaoyunguo commented 3 years ago

Hi, I am trying to interpret the processing report generated by Sports1.1. After cutadapt, the first step is to match all reads from cutadapt to the genome, right? so the number we see here is the amount of reads left from cutadapt? In the following example I have, is it right that after adapter trimming, there are 32432 reads left from 1,224,009 starting reads in the fastq file? After that, the reads are divided to map to different libraries accordingly. Is that right?


remove 5' adapter This is cutadapt 2.3 with Python 3.6.9 ......

=== Summary ===

Total reads processed: 1,224,009 Reads with adapters: 1,022,193 (83.5%) Reads with too many N: 0 (0.0%) Reads written (passing filters): 1,224,009 (100.0%)

Total basepairs processed: 83,232,612 bp Total written (filtered): 31,674,723 bp (38.1%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC; Type: regular 5'; Length: 34; Trimmed: 1022193 times.

No. of allowed errors: 0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3 Overview of removed sequences length count expect max.err error counts 3 2258 19125.1 0 2258 4 205 4781.3 0 205 ..... 68 33556 0.0 3 6860 10078 9085 7533 match to genome reads processed: 32432 reads with at least one reported alignment: 332 (1.02%) reads that failed to align: 32100 (98.98%) Reported 332 alignments


Thank you so much for your help in advance

junchaoshi commented 3 years ago

Hi,

Typically, untrimmed seq reads contain much more 3 end adapter than those with 5 end adapter. In your case, I guess the 3 end adapter also need to be trimmed by using the parameter -y <3 end adapter seq>.