nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.77k stars 630 forks source link

Docs request: Fetching remote files #5493

Open trev-f opened 1 week ago

trev-f commented 1 week ago

New feature (docs)

I would like to request documentation describing how remote files are downloaded/staged in Nextflow.

Usage scenario

Projects that require fetching large amounts of data from remote sources are common, and it's necessary to fetch those files in an efficient manner. While Nextflow makes it easy to download remote files, the lack of documentation on how remote files are handled makes it difficult to evaluate when to fetch files with this built-in Nextflow option versus building a more tailored solution.

Currently, the lack of documentation makes it difficult to build a mental model for how downloading remote files works in Nextflow. Since fetching remote data can be a massive bottleneck for some projects, it's imperative that users understand how Nextflow works so that we can build more efficient workflows.

Suggest implementation

In the remote files docs, answer some basic questions about how remote files are handled, such as:

bentsherman commented 6 days ago

@trev-f To answer your immediate questions:

@christopher-hakkaart I think we can add a section under Workflow with files > Remote Files, what do you think? You can try a first draft if you want, but I might need to do it myself because I need to check a few details in the code. In any case, this would be a great thing to document as it is a mystery to many users and unfortunately doesn't rise to the level of something that just always magically works.

christopher-hakkaart commented 5 days ago

Hi both, I'll write a draft and link the issue for feedback/corrections.

bentsherman commented 5 days ago

Sounds good, once you have a first draft I can add some details as needed