nextflow-io / nf-hack17

Nextflow hackathon 2017 projects
10 stars 2 forks source link

Project 5: Pipeline for 16S Microbial data #5

Open shaze opened 7 years ago

shaze commented 7 years ago

Project

The H3Africa Bioinformatics Network has developed a pipeline for doing 16S analysis (https://github.com/h3abionet/h3abionet16S). This pipeline has been produced using CWL. We are interested in migrating it to Nextflow for two reasons (1) some groups in the network are using Nexflow and would like to extend (e.g., for 18S or shotgun) and it would make sense to have this as a basis; and (2) it would be interesting to compare the two workflow implementations in various ways.

Data:

There are various public data sets, but we suggest using the H3ABionet Accreditation Practice data sets. Our estimate is that input data set size (including reference data sets that are needed will be about 10GB and another 4GB or so for analysis and output.

Computing resources:

The following estimate is from Gerrit who worked on the CWL workflow:

The dataset is small and we would be able to manage with 16GB of RAM for the run. With the CWL implementation we were not able to do threading so a t2.xlarge (4 cores, 16GB RAM) would be sufficient for our current design. The tasks that needs threading do not require much RAM so if it is possible to thread some of our tasks on Nextflow we can probably keep with a 16GB RAM machine but with more cores so maybe a c4.4xlarge (16 cores, 30GB) would be an option.

The above however is the requirement if we just need to make one run. We would need to think of how we will work together on this. Will we have once machine where everyone is logged in and doing testing? If that is the idea we will maybe need a bigger machine that will allow for more tasks and memory requests. Maybe a c3.8xlarge (32 cores, 60 GB RAM)

I do not know if there is any budget requirements. Maybe we just need to say we want two or three of these machines and we will definitely spin up the one and only spin up the others if we find there is a need for that.

(Also we can make some resources from our Wits cluster available)

Project Lead:

TBD: we have four people coming from Bionet and we'll make an nomination shortly.

ShakunBaichoo commented 7 years ago

Shakuntala Baichoo (H3ABioNet)