smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
90 stars 10 forks source link

Too many adapters error while using built-in adapter file #19

Closed davised closed 2 years ago

davised commented 2 years ago
[Mon Sep 13 13:26:36 2021] Writing text report to falco_err/fastq/F3D2_S190_L001_R1_001.fastq_fastqc_data.txt
[Mon Sep 13 13:26:36 2021] Writing HTML report to falco_err/fastq/F3D2_S190_L001_R1_001.fastq_fastqc_report.html
Elapsed time for file fastq/F3D2_S190_L001_R1_001.fastq: 3s
[limitst]       using file /local/downloads/falco-0.3.0/Configuration/limits.txt
[adapters]      using file /local/downloads/falco-0.3.0/Configuration/adapter_list.txt
You are testing too many adapters. The maximum number is 128!

I get this error when I specify multiple fastq files as input but not when running them individually. I can run F3D2_S190 forward and reverse just fine:

$ falco fastq/F3D2_S190_L001_R*.fastq -o F3D2_S190
[Mon Sep 13 13:31:13 2021] creating directory for output: F3D2_S190
[limitst]   using file /local/downloads/falco-0.3.0/Configuration/limits.txt
[adapters]  using file /local/downloads/falco-0.3.0/Configuration/adapter_list.txt
[contaminants]  using file /local/downloads/falco-0.3.0/Configuration/contaminant_list.txt
[Mon Sep 13 13:31:13 2021] Started reading file fastq/F3D2_S190_L001_R1_001.fastq
[Mon Sep 13 13:31:13 2021] reading file as uncompressed fastq format
[Mon Sep 13 13:31:13 2021] Finished reading file
[Mon Sep 13 13:31:13 2021] Writing text report to F3D2_S190/fastq/F3D2_S190_L001_R1_001.fastq_fastqc_data.txt
[Mon Sep 13 13:31:13 2021] Writing HTML report to F3D2_S190/fastq/F3D2_S190_L001_R1_001.fastq_fastqc_report.html
Elapsed time for file fastq/F3D2_S190_L001_R1_001.fastq: 0s
[limitst]   using file /local/downloads/falco-0.3.0/Configuration/limits.txt
[adapters]  using file /local/downloads/falco-0.3.0/Configuration/adapter_list.txt
[contaminants]  using file /local/downloads/falco-0.3.0/Configuration/contaminant_list.txt
[Mon Sep 13 13:31:14 2021] Started reading file fastq/F3D2_S190_L001_R2_001.fastq
[Mon Sep 13 13:31:15 2021] reading file as uncompressed fastq format
[Mon Sep 13 13:31:16 2021] Finished reading file
[Mon Sep 13 13:31:16 2021] Writing text report to F3D2_S190/fastq/F3D2_S190_L001_R2_001.fastq_fastqc_data.txt
[Mon Sep 13 13:31:16 2021] Writing HTML report to F3D2_S190/fastq/F3D2_S190_L001_R2_001.fastq_fastqc_report.html
Elapsed time for file fastq/F3D2_S190_L001_R2_001.fastq: 2s

I'm testing with the test dataset from here: https://mothur.org/wiki/miseq_sop/

direct link to fastq zip download: https://mothur.s3.us-east-2.amazonaws.com/wiki/miseqsopdata.zip

davised commented 2 years ago

Here is the ls and the command that I ran to get the error:

$ /bin/ls fastq
F3D0_S188_L001_R1_001.fastq    F3D147_S213_L001_R1_001.fastq  F3D5_S193_L001_R1_001.fastq
F3D0_S188_L001_R2_001.fastq    F3D147_S213_L001_R2_001.fastq  F3D5_S193_L001_R2_001.fastq
F3D141_S207_L001_R1_001.fastq  F3D148_S214_L001_R1_001.fastq  F3D6_S194_L001_R1_001.fastq
F3D141_S207_L001_R2_001.fastq  F3D148_S214_L001_R2_001.fastq  F3D6_S194_L001_R2_001.fastq
F3D142_S208_L001_R1_001.fastq  F3D149_S215_L001_R1_001.fastq  F3D7_S195_L001_R1_001.fastq
F3D142_S208_L001_R2_001.fastq  F3D149_S215_L001_R2_001.fastq  F3D7_S195_L001_R2_001.fastq
F3D143_S209_L001_R1_001.fastq  F3D150_S216_L001_R1_001.fastq  F3D8_S196_L001_R1_001.fastq
F3D143_S209_L001_R2_001.fastq  F3D150_S216_L001_R2_001.fastq  F3D8_S196_L001_R2_001.fastq
F3D144_S210_L001_R1_001.fastq  F3D1_S189_L001_R1_001.fastq    F3D9_S197_L001_R1_001.fastq
F3D144_S210_L001_R2_001.fastq  F3D1_S189_L001_R2_001.fastq    F3D9_S197_L001_R2_001.fastq
F3D145_S211_L001_R1_001.fastq  F3D2_S190_L001_R1_001.fastq    Mock_S280_L001_R1_001.fastq
F3D145_S211_L001_R2_001.fastq  F3D2_S190_L001_R2_001.fastq    Mock_S280_L001_R2_001.fastq
F3D146_S212_L001_R1_001.fastq  F3D3_S191_L001_R1_001.fastq
F3D146_S212_L001_R2_001.fastq  F3D3_S191_L001_R2_001.fastq

$ cat run_falco_err.sh
#!/bin/bash
falco fastq/*.fastq -o falco_err
guilhermesena1 commented 2 years ago

Hello,

Thank you for providing the data to reproduce the problem! Would you also be able to confirm which version of falco you are using and if you got it from Conda/direct download on github or clone? Thank you!

guilhermesena1 commented 2 years ago

I was able to reproduce the problem. This highlighted a bug where the adapter and contaminant list was being read at each file but not cleared properly, so we ended up with several copies in the adapter vector. Very nice use case! I pushed a tentative fix at 112b03b . If you cloned the repo it should work with a git pull. Let me know!

davised commented 2 years ago

oh awesome.

I built from the 0.3.0 tar.gz originally but I can built from git and let you know.

davised commented 2 years ago

Ok I rebuilt from git and it's working just fine now. Also got a speedup I think since the list wasn't getting cleared properly before!

bash ./run_falco_err.sh  4.76s user 13.24s system 84% cpu 21.212 total