nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.7k stars 621 forks source link

Update documentation for Getting Started with AWS Batch (and other cloud providers) #4429

Open stevekm opened 11 months ago

stevekm commented 11 months ago

As someone who had previously used Nextflow extensively on HPC and was learning cloud (AWS) for the first time, figuring out how to get Nextflow running with AWS proved to be really difficult and confusing.

There are official docs in a few places already;

However without the inclusion of screenshots and step-by-step instructions, I found these to be really confusing to follow and ultimately not terribly helpful.

Eventually, I found this blog post and was able to get things working much more easily;

I have since found other such blog posts though I havent perused or used them heavily so far.

I think it would be extremely helpful to have such a step-by-step walkthrough of the entire process incorporated into the offical docs somewhere, to help new cloud users get started. The inclusion of screenshots of the AWS dashboard is super important too, and it helps a lot to include a brief word or too describing what each of the relevant AWS services is and what their significance is in relation to running Nextflow and its container tasks.

( Similar to Issue here; https://github.com/nextflow-io/nextflow/issues/4413 )

bentsherman commented 10 months ago

In general we do not provide step-by-step walkthroughs of external services in the Nextflow docs.

AWS infrastructure is a huge can of worms, even a simple walkthrough opens the argument for adding more complex examples to the Nextflow docs, and then the same for Azure and Google and every other service that Nextflow integrates with.

Such walkthroughs have a high risk of becoming outdated, e.g. whenever the external service updates their user interface and we have no way of knowing.

Instead, we simply try to lay out the requirements from Nextflow's perspective and link to relevant AWS documentation. If you feel that the Nextflow docs are missing anything in those specific aspects (key requirements, helpful links to AWS docs), feel free to suggest a change.

For what it's worth, Seqera is building a team dedicated to writing such auxiliary content like tutorials and walkthroughs, like the Seqera blogs you linked, but more integrated with the Nextflow website and community.

abought commented 6 months ago

If you feel that the Nextflow docs are missing anything in those specific aspects (key requirements...), feel free to suggest a change.

In our attempts to run a workflow in AWS batch via "clean" infrastructure definitions (no use of tools like Tower), we found that two additional permissions were required for the machine that ran the nextflow command, beyond those in the docs:

Even relatively vanilla workflows failed to start unless these permissions were added. Even if you don't write new tutorials, it might be worth reviewing whether the current docs still run as written.

I can't guarantee that this is a full list, but will add any other missing permissions we might find to this ticket.