pallassgj / bpipe

Automatically exported from code.google.com/p/bpipe
0 stars 1 forks source link

Support for an easy way to parallelize based on region #18

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently Bpipe lets you easily parallelize if you split processing by sample 
or by splitting your input files up into pieces.   However many tools can 
operate on regions independently without splitting files up.   This is kind of 
tricky to do in Bpipe right now.   It would be nice to have some support for a 
direct syntax to say "run this pipeline with every chromosome in parallel".

Syntax:

  chr(1..22) * [ call_variants ]

Will create a $chr variable that the call_variants stage can use.

More detailed regions could be created using an organism specific database:

  hg19.split(60) * [ calculate_coverage_depth ]

This latter would figure out how to split the human genome into 60 roughly even 
parts for you and pass variables $chr, $start, $end to the 
calculate_coverage_depth pipeline stage, making it really easy to parallelize 
data processing.

Original issue reported on code.google.com by ssade...@gmail.com on 4 Apr 2012 at 1:16

GoogleCodeExporter commented 9 years ago
If it supports splitting input fastq files for mapping, it will be very helpful 
especially for some mapping tools that does not support parallel running. The 
command may like this:

   "input_1.fq" split(10) * [ bwa_aln + bwa_sampe ] + samtools_merge

Original comment by yanlinlin82 on 17 Jun 2012 at 10:39