Open gaow opened 3 years ago
the new UKB release is only available through this platform
Curious as why this is the case since we are interested in running UKB data on DNAnexus as well.
There are two possibilities. The first is to run sos scripts entirely on the platform by specifying python sos etc as dependencies. We will not be able to use our workflow features to process multiple files on multiple nodes. The second one can be more inline with the spirit of sos, namely wrapping scripts to be executed on dnanexus, with files already on their. The applet would have to be compiled and uploaded but not hugely different as how we handle the building of docker images and use them to process input files. This should work reasonable well for simple commands (eg bash scripts).
So this likes essentially like a sos-dnanexus module that works as a task (easier) and workflow engine (more difficult) that calls the dx command to do a lot of things.
Also building and uploading docker images could be a more general solution.
@BoPeng we are recently looking into UKB data for some traits we analyze,
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/guide-to-analyzing-large-sample-sets
the new UKB release is only available through this platform (!!). As you can see, the system is based a DNAnexus implementation of WDL along with its job manager on the DNAnexus applet. Our pipelines were written in SoS that distributes the jobs with PBS templates etc. It does not seem obvious that SoS can run with DNAnexus applet, which is perhaps the most popular (if not the only) cloud platform for WDL. What's your take on this, or suggestions for SoS users in this setup?