mossmatters / HybPiper

Recovering genes from targeted sequence capture data
GNU General Public License v3.0
111 stars 45 forks source link

Running multiple samples on a server with a UNIX loop - BWA error #47

Open Ranhini opened 4 years ago

Ranhini commented 4 years ago

Hi all,

As a little foreword: I am a pure newbie in programming and above all in UNIX... but I am trying hard :) After having run HybPiper for a bunch of samples, I am trying to use the small loop provided in the tutorial to automatically run multiple samples consecutively. But unfortunately I can't manage to make that work :(

In brief:

!/bin/bash

while read i; do ./reads_first.py -r $i*.fastq -b cacao_loci.fasta --prefix $i --bwa done < namelist.txt

Unfortunartly when I run my bash file I get the following message with an exit warning:

~/HybPiper$ sh HybPiper_Sarco.sh HybPiper was called with these arguments: --bwaq -b cacao_loci.fasta --prefix 5_01_Sar020

Making nucleotide bwa index in current directory. [CMD]: bwa index cacao_loci.fasta [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.04 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.02 sec [main] Version: 0.7.5a-r405 [main] CMD: bwa index cacao_loci.fasta [main] Real time: 0.417 sec; CPU: 0.073 sec .bamstq | samtools view -h -b -S - > 5_01_Sar02

0an/HybPiper/5_01_Sar020

*.fastq'.mem] fail to open file `/home/qian/HybPiper/5_01_Sar020 Command exited with non-zero status 1 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 1208maxresident)k 0inputs+0outputs (0major+368minor)pagefaults 0swaps [samopen] no @SQ lines in the header. [sam_read1] missing header? Abort! ERROR: Something went wrong with the BWA step, exiting!

Does anyone know if my problem lies with my UNIX environnement, the naming/format of my files or with the commands in my batch file?

Thank so much and all the best, Xavier Aubriot

mossmatters commented 4 years ago

Hello -- I have seen problems like this before where UNIX/Bash/SH are failing to expand the wildcard () in the loop properly. In this instance it looks like `$i.fastqis not expanding properly, and it is attempting to open a file called5_01_Sar020with no.fastq` suffix.

One potential issue: you've written your shell script for bash (#!/bin/bash) but you then call it using the standard shell sh HybPiper_Sarco.sh. Is there any difference if you run it with bash: bash HybPiper_Sarco.sh?

If you are getting the same error even with bash, my suggestion would be to surround all parts of the command that include a variable plus other text with quotes. For example:

do ./reads_first.py -r "$i*.fastq" -b cacao_loci.fasta --prefix $i --bwa

Let me know how it goes!

Ranhini commented 4 years ago

Hi again and sorry for the late answer!

I tried both your solutions but none is working unfortunately; I still get this annoying message :(

Instead of a loop I then built a bash file with all the code lines in series inside - not the most elegant way, but it is efficient... One silly unix/server related question also: my server is compound of 8 processors; if I want to launch several HybPiper runs in parallel I need to install a specific queue system ?

Cheers, Xavier

mossmatters commented 4 years ago

With 8 processors I would recommend running one sample at a time, serially, as you have described. HybPiper uses parallel to take advantage of all 8 processors during each of its phases of assembly. For example, if you have 300 genes, it will assign the assembly to each gene to a separate process, with an (approximately) 8-fold increase in time, as long as nothing else is happening on the machine. Good luck!