vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
272 stars 45 forks source link

Support to DNAnexus applet #1448

Open gaow opened 2 years ago

gaow commented 2 years ago

@BoPeng we are recently looking into UKB data for some traits we analyze,

https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/guide-to-analyzing-large-sample-sets

the new UKB release is only available through this platform (!!). As you can see, the system is based a DNAnexus implementation of WDL along with its job manager on the DNAnexus applet. Our pipelines were written in SoS that distributes the jobs with PBS templates etc. It does not seem obvious that SoS can run with DNAnexus applet, which is perhaps the most popular (if not the only) cloud platform for WDL. What's your take on this, or suggestions for SoS users in this setup?

BoPeng commented 2 years ago

the new UKB release is only available through this platform

Curious as why this is the case since we are interested in running UKB data on DNAnexus as well.

BoPeng commented 2 years ago

There are two possibilities. The first is to run sos scripts entirely on the platform by specifying python sos etc as dependencies. We will not be able to use our workflow features to process multiple files on multiple nodes. The second one can be more inline with the spirit of sos, namely wrapping scripts to be executed on dnanexus, with files already on their. The applet would have to be compiled and uploaded but not hugely different as how we handle the building of docker images and use them to process input files. This should work reasonable well for simple commands (eg bash scripts).

So this likes essentially like a sos-dnanexus module that works as a task (easier) and workflow engine (more difficult) that calls the dx command to do a lot of things.

BoPeng commented 2 years ago

Also building and uploading docker images could be a more general solution.

https://youtu.be/A_iki_50Ig0