jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

discrepancy in reports between cutadapt and atropos on the "expect" column #42

Closed cokelaer closed 7 years ago

cokelaer commented 7 years ago

Here is a sample of a cutadapt report for a given adapter:

Overview of removed sequences (5')
length  count   expect  max.err error counts
6   7   24.4    0   7
7   3   6.1 0   3
10  1   0.1 1   0 1
11  1   0.0 1   0 1
12  1   0.0 1   1

and atropos 1.1.14 reports:

Overview of removed sequences (5'):
length count expect max.err error counts    
                                    0 1         
------ ----- ------ ------- ------------
     6     7  146.3       0   7           
     7     3   36.6       0   3           
    10     1    4.6       1   0 1         
    11     1    2.3       1   0 1         
    12     1    1.1       1   1           

Everything is identical but the third column (expect) that is 6 times the values reported in cutadapt. Not sure what is the reason for this difference, and whether is a bug or intended behaviour.

jdidion commented 7 years ago

I think I've figured this out, but I'd like to test it on some real-world data to make sure. Can you provide a sample fastq and the atropos command you used? A subset of ~10000 reads should be enough. Thanks