Nextflow with Azure batch appears to fail when reading from multiple containers. The files are reported by processes as not existing, and .fusion.log contains messages about 403 authentication errors. This is apparently similar to a previously fixed issue, but persists in 24.10.0, so it may be a different cause?
executor > azurebatch (fusion enabled) (1)
[22/bfb52c] multi (1) [100%] 1 of 1, failed: 1
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'multi (1)'
Caused by:
The task exited with an exit code representing a failure
Command executed:
cat foo.txt bar.txt > both.txt
Command exit status:
1
Command output:
(empty)
Command error:
+ cat foo.txt bar.txt
cat: foo.txt: No such file or directory
cat: bar.txt: No such file or directory
Work dir:
[...]
Container:
[...]
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
The .nextflow.log does not contain anything that stands out, whereas the .fusion.log contains:
RESPONSE 403: 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Environment
Nextflow version: 24.10.0
Java version: 21.0.4
Operating system: Ubuntu 24.04.1 LTS
Bash version: fish 3.7.0/bash 5.2.21(1)
Additional context
We have not been able to verify whether the problem is fusion-related. The pipeline still fails (with a similar but different error message) when running with fusion.enabled: false, but it has been difficult to diagnose whether this is the same issue or an unrelated problem with getting azcopy to where it needs to be during execution.
Yes, the issue #5444 was created based on the slack discussion related to this issue - here I've just put the input from data scientists to get an overview how it was found.
Bug report
Nextflow with Azure batch appears to fail when reading from multiple containers. The files are reported by processes as not existing, and
.fusion.log
contains messages about 403 authentication errors. This is apparently similar to a previously fixed issue, but persists in 24.10.0, so it may be a different cause?See also discussion on Slack.
Expected behavior and actual behavior
We expected that it was possible to read form multiple Azure containers in the same workflow; it seems not to be.
Steps to reproduce the problem
Here is a small workflow to illustrate the problem:
Running
fails, whereas
works fine.
The config in question, containing the
azure_batch
profile (with some redacted info):Program output
Running nextflow prints:
The
.nextflow.log
does not contain anything that stands out, whereas the.fusion.log
contains:Environment
Additional context
We have not been able to verify whether the problem is fusion-related. The pipeline still fails (with a similar but different error message) when running with
fusion.enabled: false
, but it has been difficult to diagnose whether this is the same issue or an unrelated problem with gettingazcopy
to where it needs to be during execution.