nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.71k stars 620 forks source link

Azure: Params File path cannot be a blob container 'az://' #4904

Open Takadonet opened 5 months ago

Takadonet commented 5 months ago

Bug report

Expected behavior and actual behavior

Expecting that ability to reference a azure blob container for -params-file 'az://full/path/param.json'.

Actual behavior is a NextFlow error Missing Nextflow session which stop application from running. If -params-file file is on local file system, works as expected.

Steps to reproduce the problem

Program output

Top part of the stackTrace. Full nextflow.log attached.

Apr-11 10:33:09.445 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-azure@1.3.3
Apr-11 10:33:09.468 [main] DEBUG nextflow.file.FileHelper - > Added 'AzFileSystemProvider' to list of installed providers [az]
Apr-11 10:33:09.468 [main] DEBUG nextflow.file.FileHelper - Started plugin 'nf-azure' required to handle file: az://root/params.json
Apr-11 10:33:09.472 [main] DEBUG n.cloud.azure.file.AzPathFactory - Creating Azure path factory
Apr-11 10:33:09.473 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.IllegalStateException: Missing Nextflow session
        at nextflow.cloud.azure.config.AzConfig.getConfig(AzConfig.groovy:66)
        at nextflow.cloud.azure.config.AzConfig.getConfig(AzConfig.groovy:72)
        at nextflow.cloud.azure.file.AzPathFactory.parseUri(AzPathFactory.groovy:51)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:567)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1254)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1036)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1019)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
        at nextflow.file.FileSystemPathFactory$_parse_closure1.doCall(FileSystemPathFactory.groovy:76)
        at nextflow.file.FileSystemPathFactory$_parse_closure1.call(FileSystemPathFactory.groovy)
        at nextflow.file.FileSystemPathFactory.lookup0(FileSystemPathFactory.groovy:104)
        at nextflow.file.FileSystemPathFactory.parse(FileSystemPathFactory.groovy:76)
        at nextflow.file.FileHelper.asPath0(FileHelper.groovy:309)
        at nextflow.file.FileHelper.asPath(FileHelper.groovy:297)
        at nextflow.cli.CmdRun.validateParamsFile(CmdRun.groovy:641)
        at nextflow.cli.CmdRun.memoizedMethodPriv$parsedParamsMap(CmdRun.groovy:574)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

nextflow.log

Environment

Additional context

Command being executed : nextflow -log nextflow.log -c azure_batch.config run 'https://github.com/DarianHole/test-nextflow' -w 'az://root/workdir/' --outdir 'az://root/outputs/' -params-file 'az://root/params.json'

bentsherman commented 5 months ago

Related to #4494

Thanks for the triage, I was stumped by the previous ticket but maybe the session not being created yet could explain it

Takadonet commented 5 months ago

My co-workers have indicated that using -params-file S3 buckets for AWS works so perhaps order when the session is created/available is the issue. Perhaps taking a look at the AWS plugins can give hints on why it works there and not in the azure plugin.

bentsherman commented 5 months ago

Indeed the problem is that the config is loaded before the session, but the config also loads the params file to apply the params. The S3 and AZ filesystems in Nextflow depend on some config settings to resolve paths with the necessary credentials, so there is a circular dependency here.

The discussions in #2723 and #4669 are relevant here. Separating the params definition from the config file might help resolve this circular dependency. If the config file can be loaded first, then the params are resolved, the params file could be a remote file and rely on the config to retrieve remote paths. As long as the relevant config settings are themselves not dependent on params, which I don't think is typically done.

Takadonet commented 5 months ago

Based on those discussions, it appears that quick fix is not available. We will make our own temporary workaround of writing the params-files onto the local file system or attach a volume to the container instance.

Would it safe to say that you are leaning towards the functionality of remote files for -params-file in the future?

bentsherman commented 5 months ago

It's an interesting question. Of course some files simply can't be remote, like the nextflow log, config files, because they are used before the config settings are available to authenticate with remote storage. The params file sits in a grey area where it might be possible if we can get the dependencies right.

To be honest it's not a critical factor in the design of config / params. If we can accommodate it or if it helps us narrow down some design choices, I'll try to support it. But I doubt we will hang the entire design on whether or not the params file can be remote. I think it's relatively easy to stage the params file locally beforehand