Choose pipeline execution framework

Nextflow or Snakemake seem like the obvious choices. This pipeline will be:

very file-oriented
not well suited to a single homogenous cluster (i.e. Kubernetes), e.g. phenotype processing needs one core and 200G of RAM
a driver application for a homogenous Dask cluster

I have no aspirations of making this a cloud-agnostic pipeline and I think Nextflow and Snakemake are similarly matched in their GCP support. Nextflow appears to take control over deploying individual VMs when not using a cluster though (see https://www.nextflow.io/docs/latest/google.html#process-definition) and I don't see a similar feature in Snakemake. I'm not sure how much I want to trust that long term since the project appears to be driven almost entirely by a single contributor.

Overall Snakemake appears to be simpler, not require a jvm + groovy, and adopts a model where users are responsible for creating resources so I'm leaning towards it at the moment.

related-sciences / ukb-gwas-pipeline-nealelab

Choose pipeline execution framework #3