jhuapl-bio / taxtriage

TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.
MIT License
18 stars 4 forks source link

confidences.merged_mqc.tsv does not incorporate into the MultiQC #22

Closed gunarsosis closed 10 months ago

gunarsosis commented 11 months ago

Description of the bug

Alignment results are displayed in the sample specific .tsv files in "convert" directory and subsequently are incorporated into the _"confidences.mergedmqc.tsv" in the "merge" directory. The data is not being incorporated into the MultiQC report which only displays the explanations for headers in the table.

The pipeline was run by cloning of the repository main branch and running the command below.

See the screenshot of MultiQC report below: image

Command used and terminal output

nextflow run main.nf -profile singularity,sge -work-dir /scicomp/scratch/ukm9/taxtriage -c /scicomp/reference/nextflow/configs/cdc.config --input data/Validation_data/0343_samples.csv --db /scicomp/groups/OID/NCEZID/DSR/BCFB/by-project/ukm9_TaxTriage/kraken2db/kraken2_MinusB --outdir  output/Validation_data_0343_MinusB --remove_taxids '"9606"' --demux --skip_assembly

Relevant files

No response

System information

Nextflow - version 22.10.06 Hardware - HPC Executor - SGE Container - Singularity OS - CentOS

Merritt-Brian commented 11 months ago

@gunarsosis Can you check the contents of the merge .tsv file and make sure it isn't empty.

Also check that the outputs of Kraken2/Krona are not empty as well as the BAM alignment step(s)

gunarsosis commented 11 months ago

@Merritt-Brian _confidences.mergedmqc.tsv is not empty and has the data in it. Kraken2 outputs are fine and are integrated in the MultiQC. Alignment did happen correctly.

Additionally, in the output directory, _multiqc/multiqcdata/ directory contains a file with all the data under the name of _multiqcconfidences-plot.txt. which does contain all the _confidences.mergedmqc.tsv data in it. It also seems that the data is being correctly integrated into the Confidence and All Metagenomic Hits plots.

However the table (Statistics Table for Alignment against top Taxids) is not populated.

gunarsosis commented 11 months ago

I think I might have an idea what happened. The Confidence data is actually integrated in the MultiQC, however it is being converted to the beeswarm plot due to large number of rows. See image below:

image

I do see however that in the mutiqc_config.yml the option for beeswarm is set to false. Perhaps there is a conflict of configs somewhere. As a potential workaround - we can set up a parameter for max_table_rows: to be something very large like 10000.

Is there a way to rerun the multiqc only without running the entire TaxTriage pipeline to test max_table_rows workaround?

Merritt-Brian commented 10 months ago

Gotcha, yes that can be done with ease, will close when the request is complete