icgc-argo-workflows / dna-seq-processing-wfs

ICGC ARGO DNA-Seq Processing Workflow
GNU Affero General Public License v3.0
5 stars 4 forks source link

Optimization: to reduce simultaneous read requests from the same very large file, convert parallel execution of these tasks into sequential #47

Closed junjun-zhang closed 4 years ago

junjun-zhang commented 4 years ago

As shown in the diagram below, tasks in each group may be run in parallel which require reads from the same large file cause disk I/O being bottlenecked. Converting these tasks into sequential execution should be a quick way to reduce read requests to disk. image

junjun-zhang commented 4 years ago

the above diagram missed payload generation for aligned seq, which the three possible parallel tasks are dependent on.

junjun-zhang commented 4 years ago

completed here: https://github.com/icgc-argo/dna-seq-processing-wfs/commit/9b4618782c705f984360331737c634db2210530e