smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
90 stars 10 forks source link

Q: How to make `falco` work with `multiqc`? #13

Closed sklages closed 3 years ago

sklages commented 3 years ago

As I understood, current version 0.2.4 output should be compatible with current multiqc, I am using 1.9.

No matter which files I create with falco, multiqc always fails to find any analysis results.

sample_S17_L002_R1_001.fastq.gz_fastqc_data.txt
sample_S17_L002_R1_001.fastq.gz_fastqc_report.html
sample_S17_L002_R1_001.fastq.gz_summary.txt

Even when putting these files into sample_S17_L002_R1_001.fastq.gz.zip ... no success.

So I obviously missed here soemthing probably very basic/simple.

Any idea what I am doing wrong?

guilhermesena1 commented 3 years ago

Hello,

I assume you ran falco on several FASTQ files, which prepends the FASTQ name with an underscore(in this case sample_S17_L002_R1_001.fastq.gz_) to the file names.

I believe MultiQC takes a directory as input and it looks for files named summary.txt, fastqc_data.txt and fastqc_report.html exactly (without the prefix), so if you move these files to a directory (e.g. named the same as your FASTQ sample) and rename the 3 files to remove the prefix, it should work. Please let me know though if you still have trouble or if that solves the problem.

sklages commented 3 years ago

Hi,

I am running falco on individual fastq files (for better logging/timestamping) and rename the standard files according to my sample names.

Yes, seems it takes a directory or zip file to search for content.

Now I changed my pipeline in that, that a correct subdir is created containing the relevant files, like:

|--sample_S17_L002_R1_001_fastqc
|   |-- fastqc_data.txt
|   |-- fastqc_report.html
|   `-- summary.txt
`-- sample_S17_L002_R1_001_fastqc_report.html -> sample_S17_L002_R1_001_fastqc/fastqc_report.html

together with a symlink (properly named for user convenience) pointing to fastqc_report.html.

Well, it works :-)

Testing on a small 4GB fastq file shows, that falco runs 2-2.5x faster compared to fastqc (single-threaded). Running fastqc with 4 threads on a single fastq file, is not faster than single-threaded (as expected). Running fastqc with 4 threads on a pair of fastq files, it runs the same time as for one file, nowusing 2 cores.

So I think I will stick with falco for now, although the HTML report is not yet "perfect" (e.g. some tables do not match the page/report design).

Thanks for a nice piece of software!