related-sciences / ukb-gwas-pipeline-nealelab

Pipeline for reproduction of NealeLab 2018 UKB GWAS
4 stars 3 forks source link

Choose pipeline execution framework #3

Closed eric-czech closed 4 years ago

eric-czech commented 4 years ago

Nextflow or Snakemake seem like the obvious choices. This pipeline will be:

I have no aspirations of making this a cloud-agnostic pipeline and I think Nextflow and Snakemake are similarly matched in their GCP support. Nextflow appears to take control over deploying individual VMs when not using a cluster though (see https://www.nextflow.io/docs/latest/google.html#process-definition) and I don't see a similar feature in Snakemake. I'm not sure how much I want to trust that long term since the project appears to be driven almost entirely by a single contributor.

Overall Snakemake appears to be simpler, not require a jvm + groovy, and adopts a model where users are responsible for creating resources so I'm leaning towards it at the moment.

eric-czech commented 4 years ago

Snakemake has worked fairly well so far or I was at least able to convert a couple bgen chromosomes using a kubernetes cluster with it. The remote file support is pretty clearly an afterthought in the design and doesn't work particularly well with GS, but it's still usable. It doesn't seem to support directories (snakemake#576) which is definitely annoying.

In retrospect, I wish I had started with Nextflow instead but at this point the current pipeline is still reasonable so I won't backtrack.