nmdp-bioinformatics / pipeline

Consensus assembly and allele interpretation pipeline.
GNU Lesser General Public License v3.0
7 stars 7 forks source link

splitter.bash uses the split command in a form that won't work on all systems #53

Closed gturenchalk closed 9 years ago

gturenchalk commented 9 years ago

splitter.bash uses the command 'split ${MASTERWORKFILE} -n l/${NUMCPU} -e ${MYPID}'.

On my system, the default split command has neither an 'n' nor an 'e' option. 'split --version' on my system returns 'split (GNU coreutils) 8.4'. I don't know how commonly distributed the version I have is compared to the version on your system, so I don't know if I am an edge case or not.

You should consider if there is a more inclusive strategy to achieve the same result.

ckennedy-nmdp commented 9 years ago

I guess you can only assign one person per issue in Github -- Jeremy can you help Mark investigate this issue. From conversations with Greg it came out he's running on Red Hat.

mthorsen22 commented 9 years ago

This IS an issue with the version of split on RHEL and other non-Ubuntu distributions. We will look into creating a backwards compatible version of the script or maybe even converting it to PERL to perform the split.

gturenchalk commented 9 years ago

Thanks for looking into it. Once an alternative is in place, I'll be willing to give it a try. My only option for pushing forward would have been to try to hack around the issue in Perl myself and I have been to busy travelling recently to dabble on the side..

janders3-nmdp commented 9 years ago

gturenchalk: can you give this a shot now? I think that latest PR will resolve your issue, but don't want to close this one until I know for sure.

gturenchalk commented 9 years ago

Hello,

I gave it a shot but still ran into some issues. My debug mode output is provided below. I am attempting to execute the pipeline on a single input file. The new code correctly shunts into the "ugly case", but I don't think the code takes into account situations where there are fewer lines than CPUs. It looks like

let LINESPERSUBFILE=${LINESINFILE}/${NUMCPU}

is evaluating to 0 which is not a valid value for the "-l" option of the split command and leads to the following error:

Complete DEBUG output is below: ./pipeline/splitter.bash ./tutorialTest/ DEBUG

janders3-nmdp commented 9 years ago

I have indeed duplicated this issue, and am working on a fix now.

janders3-nmdp commented 9 years ago

gturenchalk, i believe I've resolved this -- if not, please advise, and I'll try another approach.

gturenchalk commented 9 years ago

I have used the new pipeline in non-debug mode and it sends the following messages to the screen (including some error messages) prior to shifting messages to the log file:

preparing to divide work evenly among 24 cores about to check linuxRedHat again ugly case ./pipeline/splitter.bash: line 34: let: S=: syntax error: operand expected (error token is "=") ./pipeline/splitter.bash: line 35: let: C=: syntax error: operand expected (error token is "=") ./pipeline/splitter.bash: line 36: let: L=S/C: division by 0 (error token is "C") ./pipeline/splitter.bash: line 40: let: E=S%C: division by 0 (error token is "C") about to call worker nohup: redirecting stderr to stdout

These errors do not appear to halt processing and pipeline processing still kicks off. Processing hasn't completed yet so I don't know if the errors have any deleterious impact on the results.

janders3-nmdp commented 9 years ago

thank you for the detailed bug report. It was a silly thing wherein I wasn't feeding the right data to my fancy new subroutine -- it had worked in the stub test program i wrote, but...

please accept my apologies.

janders3-nmdp commented 9 years ago

alright, it's been a week, and no new complaints, so I'll close this issue =) Open a new one, or reopen this if there is something further amiss.

Thanks again for all the solid bug reports!

gturenchalk commented 9 years ago

I did test the misnamed variables fix prior to going on vacation so your closure is valid. Thanks for working to make the pipeline usable on a variety of systems. I am now able to get the general pipeline workflow to execute, but I am still encountering issues unrelated to the "split" incompatibility issue, so I will open up separate issues (like #71) as I encounter them. There may be more to come if I can find some spare time to hack away at it to make sure if I am encountering bugs or if I have something configured wrong (I think I am encountering problems with the bcftools portion of the next to last processing command, perhaps related to reference indexing, or with the final samtools mpileup command, that are preventing me from getting the expected final results).