Open lonsbio opened 10 years ago
From ssade...@gmail.com on 2012-07-11T16:59:37Z
Thanks for logging this. I agree it's something that needs to be addressed, as it goes to the issue of reproducibility. I will see what can be done in small ways to address it in the short term and think about more comprehensive solutions in the longer term (I am actually thinking the 'bpipe' command itself might be a part of this in the sense it could be used to query logs for any stage of any run, such as 'bpipe log
Labels: -Priority-Medium Priority-High
From NextGen...@gmail.com on 2012-10-08T18:21:38Z
I would like to agree with the initial poster. For reproducibility as well as traceability and archival purposes a "one log per input" would be quite essential. I would for example like to integrate these files in our LIMS to trace the processing an individual file, i.e. a individual samples, has gone through. Combining that with the version tracker integrated in bpipe would be quite powerful.
From andrew.o...@gmail.com on 2012-07-10T07:53:45Z
What steps will reproduce the problem? 1. Run a pipeline with parallel commands.
[16:18:02] Modeling fragment count overdispersion. [16:18:02] Calculating initial abundance estimates for bias correction.<----- [From Cufflinks]
There are many situations where this is problematic. For example, if a particular application gives similar output to another application running in parallel, it is impossible to distinguish them. Or, if multiple commands of the same application are running, it looks something like this:
Calculating read count stats... 444621 Queried Treatment Observations Sample_43_CNT_24hr.bam 313186 Queried Control Observations Sample_58_Hep_cnt_72hr.bam
Calculating negative binomial p-values and FDRs in R using DESeq ( http://www-huber.embl.de/users/anders/DESeq/).. . chr19 chr5_random chr18
Calculating read count stats... 396066 Queried Treatment Observations Sample_43_CNT_24hr.bam 279005 Queried Control Observations Sample_58_Hep_cnt_72hr.bam
Calculating negative binomial p-values and FDRs in R using DESeq ( http://www-huber.embl.de/users/anders/DESeq/).. . chr5_random chr18 chrM
Calculating read count stats... 445143 Queried Treatment Observations Sample_43_CNT_24hr.bam 308133 Queried Control Observations Sample_60_3T3_cnt_72hr.bam
There is no way to tell which initial command produced which particular output stats (except that in this case, the application was kind enough to provide us with the input file names, but not all applications will do this). Likewise, If multiple commands are running in parallel and one reports an error, it is difficult to know which command caused the error.
Ideally, the output could be saved within separate files for each command that is issued, then errors, etc. can be associated with originating commands. What version of the product are you using? On what operating system? bpipe-0.9.5, Linux Please provide any additional information below. Maybe one way to do this would be to have a "command id" in the commandlog.txt file, after each command. Then within .bpipe directory, you could have a directory containing files named.something containing 1) the present working directory, 2) the command as well as 3) the stderr and stdout for the command. And furthermore you could use this commandid as a job name for submissions to the cluster, and then when we get an email notification of an error (e.g., non-zero exit status), we can go directly to this command output file to see the full screen output/error message.
Original issue: http://code.google.com/p/bpipe/issues/detail?id=42