nf-core / bamtofastq

Converts bam or cram files to fastq format and does quality control.
https://nf-co.re/bamtofastq
MIT License
21 stars 13 forks source link

MultiQC error with `test` profile about missing `multiqc_plots` #41

Closed BrunoGrandePhD closed 2 years ago

BrunoGrandePhD commented 2 years ago

I was running version 1.2 of bamtofastq, and I ran into the following error with the MultiQC job at the end.

ls: cannot access 'multiqc_plots': No such file or directory

I didn't used to run into this issue, but when I tested with a slightly older version of bamtofastq that I was able to previously run fine, I got the same error. I'm also able to reproduce the error using the test profile.

I wonder if this has to do with a change to Nextflow. For reference, I'm using Nextflow v22.04.0 on AWS Batch, whereas I previously used v21.10.5. I'll try re-running with the older version of Nextflow, but I wanted to put this on our radar.

Full Error Message ``` nxf-scratch-dir ip-172-22-3-104.ec2.internal:/tmp/nxf.ftJDswNCI7 [WARNING] multiqc : MultiQC Version v1.12 now available! [INFO ] multiqc : Not cleaning sample names [INFO ] multiqc : This is MultiQC v1.9 [INFO ] multiqc : Template : default [INFO ] multiqc : Report title: test-bamtofastq [INFO ] multiqc : Searching : /tmp/nxf.ftJDswNCI7/multiqc_config.yaml [INFO ] multiqc : Searching : /tmp/nxf.ftJDswNCI7 Searching 34 files.. [INFO ] fastqc : Found 3 reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] samtools : Found 3 stats reports [INFO ] samtools : Found 3 flagstat reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] samtools : Found 3 idxstats reports [INFO ] fastqc : Found 5 reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] custom_content : qbic-pipelines-bamtofastq-summary: Found 1 sample (html) [INFO ] custom_content : software_versions: Found 1 sample (html) [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : test_bamtofastq_multiqc_report.html [INFO ] multiqc : Data : test_bamtofastq_multiqc_report_data [INFO ] multiqc : Plots : test_bamtofastq_multiqc_report_plots [INFO ] multiqc : MultiQC complete ```
BrunoGrandePhD commented 2 years ago

Sadly, I get the same error with Nextflow v21.10.5.

Missing output file(s) `multiqc_plots` expected by process `multiqc`
FriederikeHanssen commented 2 years ago

I have to say that I haven't runthe pipeline since adding the cram conversion. Looks like MultiQC is definitely doing something though, like finding files. Have you tried running MultiQC by hand? What is the content of the .command.err of the MultiQC process?

BrunoGrandePhD commented 2 years ago

I agree that MultiQC looks like it runs fine.

.command.err ``` ❯ aws s3 cp s3://example-project-tower-scratch/work/e8/441b5cc24f08ada38e3f18cf117676/.command.err - [WARNING] multiqc : MultiQC Version v1.12 now available! [INFO ] multiqc : Not cleaning sample names [INFO ] multiqc : This is MultiQC v1.9 [INFO ] multiqc : Template : default [INFO ] multiqc : Report title: test-bamtofastq [INFO ] multiqc : Searching : /tmp/nxf.ftJDswNCI7/multiqc_config.yaml [INFO ] multiqc : Searching : /tmp/nxf.ftJDswNCI7 [INFO ] fastqc : Found 3 reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] samtools : Found 3 stats reports [INFO ] samtools : Found 3 flagstat reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] samtools : Found 3 idxstats reports [INFO ] fastqc : Found 5 reports /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/lib/python3.9/site-packages/multiqc/plots/bargraph.py:451: UserWarning: FixedFormatter should only be used together with FixedLocator axes.set_xticklabels(['{:.0f}%'.format(x) for x in vals]) [INFO ] custom_content : qbic-pipelines-bamtofastq-summary: Found 1 sample (html) [INFO ] custom_content : software_versions: Found 1 sample (html) [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : test_bamtofastq_multiqc_report.html [INFO ] multiqc : Data : test_bamtofastq_multiqc_report_data [INFO ] multiqc : Plots : test_bamtofastq_multiqc_report_plots [INFO ] multiqc : MultiQC complete ```

The HTML report is generated as expected.

                           PRE test_bamtofastq_multiqc_report_data/
2022-05-11 17:41:11          0
2022-05-11 17:41:21          6 .command.begin
2022-05-11 17:42:11       2116 .command.err
2022-05-11 17:42:08       2263 .command.log
2022-05-11 17:42:11         21 .command.out
2022-05-11 17:41:12      19222 .command.run
2022-05-11 17:41:12        130 .command.sh
2022-05-11 17:42:06        229 .command.trace
2022-05-11 17:42:07          1 .exitcode
2022-05-11 17:42:07    1319124 test_bamtofastq_multiqc_report.html

The error seems to be caused by Nextflow not finding the multiqc_plots output folder. I'm guessing it's related to this line. Indeed, if I look at a past run, that folder does exist.

                           PRE multiqc_data/
                           PRE multiqc_plots/
2021-12-17 21:09:19          0
2021-12-17 21:09:28          6 .command.begin
2021-12-17 21:14:08       2002 .command.err
2021-12-17 21:14:11       2068 .command.log
2021-12-17 21:14:07          0 .command.out
2021-12-17 21:09:19     137882 .command.run
2021-12-17 21:09:19         64 .command.sh
2021-12-17 21:14:09        235 .command.trace
2021-12-17 21:14:11          1 .exitcode
2021-12-17 21:14:09    5078625 multiqc_report.html

Can you think of anything that would explain why the multiqc_plots would no longer be generated?

FriederikeHanssen commented 2 years ago

no not really. Is the html report fine? I will try to find some time to replicate it on my side, but probably not for a couple of days. Deep in the trenches of finishing the last things for the sarek release.

BrunoGrandePhD commented 2 years ago

No worries! I'm not blocked by this since the FASTQ files were generated regardless. I'm also excited about the new Sarek release! 😉

The HTML report looks fine to me. I've attached the entire work directory for debugging purposes.

multiqc.zip

Given how the data directories look different in my previous comment, I'm going to try running the workflow without a run name.

multiqc_data/

vs

test_bamtofastq_multiqc_report_data/
BrunoGrandePhD commented 2 years ago

I think I found the root cause. It has to do with the run name. I re-ran the test profile without assigning a custom run name in Tower, and the multiqc_plots folder was created as expected.

I'm guessing it has to do with one of these two arguments. That said, I don't know if this is an issue with bamtofastq or MultiQC.

                           PRE multiqc_data/
                           PRE multiqc_plots/
2022-05-12 07:13:54          0
2022-05-12 07:14:05          6 .command.begin
2022-05-12 07:14:49       1996 .command.err
2022-05-12 07:14:53       2082 .command.log
2022-05-12 07:14:49         21 .command.out
2022-05-12 07:13:54      19222 .command.run
2022-05-12 07:13:54         64 .command.sh
2022-05-12 07:14:50        229 .command.trace
2022-05-12 07:14:52          1 .exitcode
2022-05-12 07:14:51    1318992 multiqc_report.html
BrunoGrandePhD commented 2 years ago

That said, the Nextflow commands look identical. They both use -name, but one is set automatically by Tower whereas the other is customized. I wonder if it has to do with the - vs _. 😒 I'll run some tests.

# This one worked
nextflow run 'https://github.com/qbic-pipelines/bamtofastq' \
         -name big_caravaggio \
         -with-tower 'https://tower.sagebionetworks.org/api' \
         -r 1.2.0 \
         -profile test

# But this one didn't work
nextflow run 'https://github.com/qbic-pipelines/bamtofastq' \
         -name test-bamtofastq \
         -with-tower 'https://tower.sagebionetworks.org/api' \
         -r 1.2.0 \
         -profile test
BrunoGrandePhD commented 2 years ago

Okay, I think I've isolated the issue to the use of hyphens in the run name. Underscores work fine.

I'll see if I can further narrow down the issue to rfilename or rtitle.

BrunoGrandePhD commented 2 years ago

I tried a few things to fix the issue (i.e. to ensure that the multiqc_plots folder is generated when a hyphen is used in the run name). I don't have a lot of experience with MultiQC. Maybe @ewels has some insight on why this is happening.

I tried setting rtitle to '' (commit), but that didn't work.

$ multiqc -f -s  --filename eliminate_rtitle_multiqc_report multiqc_config.yaml .

                           PRE ensure_consistency_multiqc_report_data/
2022-05-12 11:35:12          0
2022-05-12 11:35:25          6 .command.begin
2022-05-12 11:36:21       2128 .command.err
2022-05-12 11:36:13       2276 .command.log
2022-05-12 11:36:20         21 .command.out
2022-05-12 11:35:12      19222 .command.run
2022-05-12 11:35:12        136 .command.sh
2022-05-12 11:36:11        229 .command.trace
2022-05-12 11:36:13          1 .exitcode
2022-05-12 11:36:12    1319134 ensure_consistency_multiqc_report.html

I tried setting rfilename to '' (commit), but that also didn't work.

$ multiqc -f -s --title "eliminate-rfilename"  multiqc_config.yaml .

                           PRE eliminate_rtitle_multiqc_report_data/
2022-05-12 11:34:56          0
2022-05-12 11:35:05          6 .command.begin
2022-05-12 11:36:06       2061 .command.err
2022-05-12 11:35:57       2209 .command.log
2022-05-12 11:36:05         21 .command.out
2022-05-12 11:34:56      19222 .command.run
2022-05-12 11:34:56        106 .command.sh
2022-05-12 11:35:55        229 .command.trace
2022-05-12 11:35:56          1 .exitcode
2022-05-12 11:35:56    1318947 eliminate_rtitle_multiqc_report.html

However, when I set both rtitle and rfilename to '' (commit), the multiqc_plots folder is generated as expected.

$ multiqc -f -s   multiqc_config.yaml .

                           PRE multiqc_data/
                           PRE multiqc_plots/
2022-05-12 11:52:44          0
2022-05-12 11:52:52          6 .command.begin
2022-05-12 11:53:38       1996 .command.err
2022-05-12 11:53:41       2081 .command.log
2022-05-12 11:53:37         21 .command.out
2022-05-12 11:52:44      19222 .command.run
2022-05-12 11:52:44         64 .command.sh
2022-05-12 11:53:39        229 .command.trace
2022-05-12 11:53:41          1 .exitcode
2022-05-12 11:53:39    1318959 multiqc_report.html
ewels commented 2 years ago

Interesting! Can you remove Nextflow from the equation and try to get as simple a MultiQC command as possible that exhibits this behaviour? If there's a bug with hyphenated report titles then that needs fixing.

Only other thing that I can think of is that the hyphens are somehow being interpreted as command line arguments. But I would expect that to cause an error rather than a silent failure.

BrunoGrandePhD commented 2 years ago

I haven't run into this issue since switching to underscores, so I've decided not to spend the time on generating a reprex when there's an easy workaround.