Closed apetkau closed 5 months ago
Hi @apetkau
Thanks for reporting this!
We use the built-in functionality of nextflow to test the existence of all files, this will probably require some work to figure out the problem since it is probably in the Nextflow codebase. I would suggest using the path
format for all your files for now since that won't do a check if it's a file or directory. I'll have a look at it when I got some time.
Thanks so much @nvnieuwk. That makes sense.
Hi @apetkau, do you know which version of Nextflow and nf-validation you used? It should be in the logs.
Hello @adamrtalbot. This is with Nextflow 23.04.2
and nf-validation 1.1.3
. Thanks.
OK with 23.10.0 and nf-validation 1.1.3 we saw the problem the other way around, i.e. if the path did not include a slash nf-validation complained it was not a directory. Which is closer to the truth but still not accurate.
From nf-core megatests:
Pulling nf-core/fetchngs ...
downloaded from https://github.com/nf-core/fetchngs.git
Launching `https://github.com/nf-core/fetchngs` [azure_fetchngs_small] DSL2 - revision: 04ee5031a4 [master]
Downloading plugin nf-validation@1.1.3
WARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`
-[2m----------------------------------------------------[0m-
[0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m ___ __ __ __ ___ [0;32m/,-._.--~'[0m
[0;34m |\ | |__ __ / ` / \ |__) |__ [0;33m} {[0m
[0;34m | \| | \__, \__/ | \ |___ [0;32m\`-._,-`-,[0m
[0;32m`._,._,'[0m
[0;35m nf-core/fetchngs v1.11.0-g04ee503[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
[0;34mrevision : [0;32mmaster[0m
[0;34mrunName : [0;32mazure_fetchngs_small[0m
[0;34mlaunchDir : [0;32m/mnt/resource/batch/tasks/workitems/nf-workflow-5GJmv03MWYhTOs/job-1/nf-workflow-5GJmv03MWYhTOs/wd[0m
[0;34mworkDir : [0;32m/work/work/fetchngs/work-f794ea3cb15147c339cae6225c82a408834597a3[0m
[0;34mprojectDir : [0;32m/.nextflow/assets/nf-core/fetchngs[0m
[0;34muserName : [0;32mroot[0m
[0;34mprofile : [0;32mtest[0m
[0;34mconfigFiles : [0;32m[0m
[1mInput/output options[0m
[0;34minput : [0;32m${projectDir}/tests/sra_ids_test.csv[0m
[0;34moutdir : [0;32maz://work/fetchngs/results-test-f794ea3cb15147c339cae6225c82a408834597a3[0m
[1mInstitutional config options[0m
[0;34mconfig_profile_name : [0;32mTest profile[0m
[0;34mconfig_profile_description: [0;32mMinimal test dataset to check pipeline function[0m
[1mMax job request options[0m
[0;34mmax_cpus : [0;32m2[0m
[0;34mmax_memory : [0;32m6.GB[0m
[0;34mmax_time : [0;32m6.h[0m
!! Only displaying parameters that differ from the pipeline defaults !!
-[2m----------------------------------------------------[0m-
If you use nf-core/fetchngs for your analysis please cite:
* The pipeline
https://doi.org/10.5281/zenodo.5070524
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/fetchngs/blob/master/CITATIONS.md
-[2m----------------------------------------------------[0m-
ERROR ~ ERROR: Validation of pipeline parameters failed!
-- Check 'nf-5GJmv03MWYhTOs.log' file for details
[0;31mThe following invalid input values have been detected:
* --input: the file or directory '${projectDir}/tests/sra_ids_test.csv' does not exist.
* --outdir: 'az://work/fetchngs/results-test-f794ea3cb15147c339cae6225c82a408834597a3' is not a directory, but a file (az://work/fetchngs/results-test-f794ea3cb15147c339cae6225c82a408834597a3)
Oh, interesting. We will try to test this out with Nextflow 23.10.0
then and see what happens. Thank you.
Thanks for looking into this!
I was having this same issue, on an azure storage without hierarchical namespaces and on nextflow 23.04.1. I should update nextflow and try again, but I believe we will find consistent results.
I'd like to add my two cents to the issue anyway:
My storage account has hierarchical namespaces disabled. I'm no expert, but I've seen that in such storages it is possible to create a file AND a folder with the same name. For instance, you can create two files, the first one at some/file.txt
and the second at some/file.txt/insidefolder.txt
. To my understanding, the reason behind this is that folders do not actually exist, and the path is just some attribute metadata you assign to a file.
Therefore in my opinion appending /
at the end of the path would be required for azure storage directories.
I am now curious of what would happen with the path some/file.txt
when the storage container is mounted locally on a linux system using blobfuse2
, I bet some error happens, otherwise it would be both a file and a directory... who knows...
I forgot to update this issue with our tests with Nextflow 23.10.0
, sorry about that. Everything works after upgrading Nextflow. That is, a path with a trailing /
in Azure (like az://path/to/dir/
) is successfully validated as a directory for a Nextflow pipeline. Thanks so much for the suggestion.
@zeehio thanks so much for your input. I did not know you could have the same name be interpreted as both a file and a folder, but it makes sense. It also makes sense then that folders need to have an appended /
in Azure storage.
OK I wrote a script to iterate through some options. Currently, the nf-validation, fetchngs and Nextflow version doesn't seem to make a difference - all of them except the latest seem to have this error.
However, the version of nf-azure does seem to work. I tried with the following versions:
There's possibly a subtle interaction with Nextflow version as well which might be worth looking into, but I figured if nf-azure is the cause I will focus on that until I identify the specific version that causes the problem.
Reminder, nf-azure version * outdir
with/without slash. Results:
with slash | without slash | |
---|---|---|
1.0.1 | :white_check_mark: | :x: |
1.3.3 | :white_check_mark: | :x: |
1.4.0 | :white_check_mark: | :x: |
I suspect the problem is with the nf-azure plugin not handling the pseudo-directories in Azure correctly, will look into it and raise an issue there.
So in summary, the current workaround is to use the latest stable version of everything, with a slash in the outdir
. If you do this it should work:
Wow. Awesome. Thanks so much for all your help @adamrtalbot in looking into this 😄
As of Nextflow v23.10.1, nf-validation 1.1.3 and nf-azure 1.3.3 this problem has popped back up.
Easiest solution would be to ignore file or directory validation for Azure and GCP storage since it's fake anyway.
Recreate it with
nextflow run nf-core/sarek -r 3.14.0 -profile test --outdir az://bucket/outputs
Using the above mentioned versions.
Using the edge release (v24.03.0-edge) does not fix this.
It's possibly because Nextflow explicitly checks for a directory attribute, which is probably not true if the path does not exist: https://github.com/nextflow-io/nextflow/blob/019eb86c2c0169f18f115a0924dcdf3cb958f981/plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzPath.groovy#L120-L127
OK here's the line in nf-schema: https://github.com/nextflow-io/nf-schema/blob/252c714a49210318bb152d0726a04b5299d5c881/plugins/nf-schema/src/main/nextflow/validation/CustomEvaluators/FormatFilePathPatternEvaluator.groovy#L43
Which calls this method in nf-azure: https://github.com/nextflow-io/nextflow/blob/019eb86c2c0169f18f115a0924dcdf3cb958f981/plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzPath.groovy#L96-L98
Which is set here: https://github.com/nextflow-io/nextflow/blob/019eb86c2c0169f18f115a0924dcdf3cb958f981/plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzPath.groovy#L76-L88
@pditommaso I'm not familiar with the code here, is there any way to tell a file or directory apart in Nextflow? Or should we should handle it on the nf-schema side?
I am trying to run a pipeline within Azure and use an Azure path for the output directory (e.g.,
az://12345/outputs/
). However, the validation of this path causes issues:This seems to be related to the
outdir
defined as formatdirectory-path
in thenextflow_schema.json
file:If the
directory-path
format is removed (or replaced withfile-path
), the pipeline successfully executes.I am wondering if someone could help out with this? I am using the most recent version of nf-validation (
1.1.1
). Thanks.