smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
94 stars 10 forks source link

Difference between falco and fastqc in reported polyA/polyG content #69

Open wm75 opened 3 weeks ago

wm75 commented 3 weeks ago

The output of falco (1.2.3) and fastqc (0.12.1) is markedly different for polyA stats (and I assume would also be for polyG, but haven't tested). I haven't looked into the actual code producing them, but compared to fastqc the values from falco seem to increase much faster, here's an example of the corresponding section of the reports:

First fastqc:

#Position   Illumina Universal Adapter  Illumina Small RNA 3' Adapter   Illumina Small RNA 5' Adapter   Nextera Transposase Sequence    PolyA   PolyG
1   0.0 0.0 0.0 0.0 0.020387359836901122    0.0
2   0.0 0.0 0.0 0.0 0.06116207951070336 0.0
3   0.0 0.0 0.0 0.0 0.06116207951070336 0.0
4   0.0 0.0 0.0 0.0 0.06116207951070336 0.0
5   0.0 0.0 0.0 0.0 0.12232415902140673 0.0
6   0.0 0.0 0.0 0.0 0.12232415902140673 0.0
7   0.0 0.0 0.0 0.0 0.12232415902140673 0.0
8   0.0 0.0 0.0 0.0 0.14271151885830785 0.0
9   0.0 0.0 0.0 0.0 0.14271151885830785 0.0
10-11   0.0 0.0 0.0 0.0 0.14271151885830785 0.0
12-13   0.0 0.0 0.0 0.0 0.1529051987767584  0.0
14-15   0.0 0.0 0.0 0.0 0.17329255861365955 0.0
16-17   0.0 0.0 0.0 0.0 0.21406727828746175 0.0
18-19   0.0 0.0 0.0 0.0 0.22426095820591233 0.0
20-21   0.12232415902140673 0.0 0.0 0.0 0.24464831804281345 0.0
22-23   0.12232415902140673 0.0 0.0 0.0 0.24464831804281345 0.0
24-25   0.12232415902140673 0.0 0.0 0.0 0.24464831804281345 0.0
26-27   0.1325178389398573  0.0 0.0 0.0 0.24464831804281345 0.0
28-29   0.2038735983690112  0.0 0.0 0.0 0.24464831804281345 0.0
30-31   0.22426095820591233 0.0 0.0 0.0 0.24464831804281345 0.0
32-33   0.22426095820591233 0.0 0.0 0.0 0.2650356778797146  0.0
34-35   0.24464831804281345 0.0 0.0 0.0 0.29561671763506625 0.0
36-37   0.3058103975535168  0.0 0.0 0.0 0.31600407747196735 0.0
38-39   0.4383282364933741  0.0 0.0 0.0 0.32619775739041795 0.0
40-41   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
42-43   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
44-45   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
46-47   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
48-49   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
50-51   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
52-53   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
54-55   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
56-57   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
58-59   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
60-61   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
62-63   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
64-65   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
66-67   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
68-69   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
70-71   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
72-73   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
74-75   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
76-77   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
78-79   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
80-81   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
82-83   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
84-85   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
86-87   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
88-89   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
90-91   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
92-93   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
94-95   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0
96-97   0.4689092762487258  0.0 0.0 0.0 0.32619775739041795 0.0

next falco:

#Position   Illumina Universal Adapter  Illumina Small RNA 3' Adapter   Illumina Small RNA 5' Adapter   Nextera Transposase Sequence    PolyA   PolyG
1   0   0   0   0   0.0203874   0
2   0   0   0   0   0.0815494   0
3   0   0   0   0   0.142712    0
4   0   0   0   0   0.183486    0
5   0   0   0   0   0.285423    0
6   0   0   0   0   0.38736 0
7   0   0   0   0   0.489297    0
8   0   0   0   0   0.591233    0
9   0   0   0   0   0.672783    0
10  0   0   0   0   0.754332    0
11  0   0   0   0   0.835882    0
12  0   0   0   0   0.917431    0
13  0   0   0   0   1.01937 0
14  0   0   0   0   1.1213  0
15  0   0   0   0   1.24363 0
16  0   0   0   0   1.34557 0
17  0   0   0   0   1.46789 0
18  0   0   0   0   1.59021 0
19  0   0   0   0   1.67176 0
20  0.122324    0   0   0   1.75331 0
21  0.122324    0   0   0   1.83486 0
22  0.122324    0   0   0   1.89602 0
23  0.122324    0   0   0   1.95719 0
24  0.122324    0   0   0   2.01835 0
25  0.122324    0   0   0   2.07951 0
26  0.122324    0   0   0   2.14067 0
27  0.142712    0   0   0   2.20183 0
28  0.183486    0   0   0   2.263   0
29  0.224261    0   0   0   2.32416 0
30  0.224261    0   0   0   2.38532 0
31  0.224261    0   0   0   2.44648 0
32  0.224261    0   0   0   2.50765 0
33  0.224261    0   0   0   2.60958 0
34  0.224261    0   0   0   2.71152 0
35  0.265036    0   0   0   2.81346 0
36  0.285423    0   0   0   2.89501 0
37  0.326198    0   0   0   2.99694 0
38  0.407747    0   0   0   3.09888 0
39  0.468909    0   0   0   3.20082 0
40  0.468909    0   0   0   3.30275 0
41  0.468909    0   0   0   3.40469 0
42  0.468909    0   0   0   3.48624 0
43  0.468909    0   0   0   3.56779 0
44  0.468909    0   0   0   3.60856 0
45  0.468909    0   0   0   3.62895 0
46  0.468909    0   0   0   3.64934 0
47  0.468909    0   0   0   3.66972 0
48  0.468909    0   0   0   3.69011 0
49  0.468909    0   0   0   3.7105  0
50  0.468909    0   0   0   3.7105  0
51  0.468909    0   0   0   3.7105  0
52  0.468909    0   0   0   3.7105  0
53  0.468909    0   0   0   3.7105  0
54  0.468909    0   0   0   3.7105  0
55  0.468909    0   0   0   3.7105  0
56  0.468909    0   0   0   3.7105  0
57  0.468909    0   0   0   3.7105  0
58  0.468909    0   0   0   3.7105  0
59  0.468909    0   0   0   3.73089 0
60  0.468909    0   0   0   3.75127 0
61  0.468909    0   0   0   3.77166 0
62  0.468909    0   0   0   3.81244 0
63  0.468909    0   0   0   3.85321 0
64  0.468909    0   0   0   3.89399 0
65  0.468909    0   0   0   3.93476 0
66  0.468909    0   0   0   3.97554 0
67  0.468909    0   0   0   4.01631 0
68  0.468909    0   0   0   4.05708 0
69  0.468909    0   0   0   4.09786 0
70  0.468909    0   0   0   4.13863 0
71  0.468909    0   0   0   4.17941 0
72  0.468909    0   0   0   4.22018 0
73  0.468909    0   0   0   4.26096 0
74  0.489297    0   0   0   4.32212 0
75  0.489297    0   0   0   4.38328 0
76  0.489297    0   0   0   4.42406 0
77  0.489297    0   0   0   4.46483 0
78  0.489297    0   0   0   4.50561 0
79  0.489297    0   0   0   4.54638 0
80  0.489297    0   0   0   4.58716 0
81  0.489297    0   0   0   4.62793 0
82  0.489297    0   0   0   4.66871 0
83  0.509684    0   0   0   4.70948 0
84  0.509684    0   0   0   4.75025 0
85  0.509684    0   0   0   4.79103 0
86  0.509684    0   0   0   4.8318  0
87  0.509684    0   0   0   4.91335 0
88  0.509684    0   0   0   4.9949  0
89  0.509684    0   0   0   5.05607 0
90  0.509684    0   0   0   5.09684 0
91  0.509684    0   0   0   5.158   0
92  0.570846    0   0   0   5.21916 0
93  0.632008    0   0   0   5.28033 0
94  0.632008    0   0   0   5.34149 0
95  0.632008    0   0   0   5.40265 0
96  0.632008    0   0   0   5.46381 0
97  0.632008    0   0   0   5.52497 0
98  0.632008    0   0   0   5.52497 0
99  0.632008    0   0   0   5.52497 0
100 0.632008    0   0   0   5.52497 0
101 0.632008    0   0   0   5.52497 0
102 0.632008    0   0   0   5.52497 0
103 0.632008    0   0   0   5.52497 0
104 0.632008    0   0   0   5.52497 0
105 0.632008    0   0   0   5.52497 0
106 0.632008    0   0   0   5.52497 0
107 0.632008    0   0   0   5.52497 0
108 0.632008    0   0   0   5.52497 0

My first guess would be that falco counts longer polyA runs multiple times?

The input data was https://raw.githubusercontent.com/galaxyproject/tools-iuc/6b50408a1ff7902575be37b2fa21aa80fe684e5c/tools/falco/test-data/1000trimmed.fastq and both tools where run with just default settings.

andrewdavidsmith commented 3 weeks ago

You're right @wm75 I'll see about fixing it this week.