pallassgj / bpipe

Automatically exported from code.google.com/p/bpipe
0 stars 1 forks source link

Support for Child Directories for Outputs of Pipeline Stages #43

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently Bpipe relies on all the outputs that are generated by your pipeline 
being produced appearing in the same directory as the pipeline script (it's not 
compulsory, but some Bpipe features don't work or you can get odd results if 
you put output files elsewhere).

This really starts to break down when you have a huge number of files in a 
project or job, especially intermediate files that are just computational 
byproducts.  

Bpipe should fully support tasks creating outputs in child directories.

Original issue reported on code.google.com by ssade...@gmail.com on 12 Jul 2012 at 12:04

GoogleCodeExporter commented 9 years ago

Original comment by ssade...@gmail.com on 12 Jul 2012 at 12:04

GoogleCodeExporter commented 9 years ago

Original comment by ssade...@gmail.com on 12 Jul 2012 at 12:06

GoogleCodeExporter commented 9 years ago
Alternatively, it could be a nice feature / solution if Bpipe was automatically 
creating a separate directory for each stage of the pipeline it runs (with the 
name of the stage as its name). This could improve the transparency / data 
provenance (together with solved Issue 2).

Maybe it would disturb something else though - no idea.

Original comment by maciek.k...@gmail.com on 12 Jul 2012 at 1:49

GoogleCodeExporter commented 9 years ago
I was actually trying to have bpipe output results of a pipeline stage in a 
sub-directory.  This actually doesn't seem possible unless I hard code the 
output files.  For example, I tried doing something akin to

stage {
  produce("outdir/${input}_world.txt", "outdir/${input}_mars.txt") {
     exec """cat $input >> $output1"""
     exec """cat $input >> $output2"""
  }
}

where, actually, I had determined outdir programatically.  However, bpipe seems 
to be stripping off the output directory and creating the output files directly 
in the directory from which it is executed.  Is there really no way to tell 
bpipe that output files should be placed in a specific directory / 
subdirectory?  My use-case actually relies fairly heavily on a directory 
structure where multiple parts of the pipeline share the same input files but 
should output to different sub-directories.  Currently, the only way I can see 
to do this would be to write a pipeline for every stage that is executed in 
that stage's subdirectory --- it seems like this shouldn't be necessary.

Original comment by rob.pa...@gmail.com on 26 Oct 2013 at 3:55