mossmatters / HybPiper

Recovering genes from targeted sequence capture data
GNU General Public License v3.0
111 stars 45 forks source link

Issue running test data - error with SPAdes assembly #10

Open CharlesBarden opened 8 years ago

CharlesBarden commented 8 years ago

I'm using Linux Redhat and am simply trying to run the tutorial with the test data set. Any tips or help will be much appreciated. The following is the command I entered and the resulting error message:

[test_dataset]$ ../reads_first.py -b test_targets.fasta -r NZ281_R*_test.fastq --prefix NZ281 --bwa

...

time parallel --eta spades.py --only-assembler --threads 1 --cov-cutoff 8 --12 {}/{}_interleaved.fasta -o {}/{}_spades :::: spades_genelist.txt > spades.log /bin/sh: parallel: command not found

real 0m0.001s user 0m0.001s sys 0m0.000s ERROR: One or more genes had an error with SPAdes assembly. This may be due to low coverage. No contigs found for the following genes: gene660 gene461 gene293 gene001 gene026 gene002 gene111 gene074 gene298 gene079 gene006 gene012 gene030 Traceback (most recent call last): File "/home/berendzen/HybPiper/spades_runner.py", line 166, in if name == "main":main() File "/home/berendzen/HybPiper/spades_runner.py", line 160, in main spades_failed,spades_duds = rerun_spades("failed_spades.txt",cov_cutoff=args.cov_cutoff,paired=is_paired) File "/home/berendzen/HybPiper/spades_runner.py", line 82, in rerun_spades all_kmers = [int(x[1:]) for x in os.listdir(os.path.join(gene,"{}_spades".format(gene))) if x.startswith("K")] OSError: [Errno 2] No such file or directory: 'gene660/gene660_spades' WARNING: Something went wrong with the assemblies! Check for failed assemblies and re-run! ERROR: No genes had assembled contigs! Exiting!

mossmatters commented 8 years ago

The first error: /bin/sh: parallel: command not found indicates that GNU Parallel is not installed or is not in your $PATH. You can get GNU Parallel from here: https://www.gnu.org/software/parallel/

You should also check whether the other dependencies (BLAST, BWA, SPAdes, Exonerate, and BioPython) are installed and are in your $PATH. You can do this by running:

python reads_first.py --check

CharlesBarden commented 8 years ago

Thank you for your help. I think I've fixed the previous problem but when I run the program I'm encountering some different errors. Any idea what the issue is?

python2.7env/lib/python2.7/site-packages/Bio/Seq.py:2071: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning) 0 proteins had no good matches.

real 0m0.127s user 0m0.112s sys 0m0.016s Running SPAdes on 13 genes time parallel --eta spades.py --only-assembler --threads 1 --cov-cutoff 8 -s {}/{}_interleaved.fasta -o {}/{}_spades :::: spades_genelist.txt > spades.log

eal 0m8.477s user 0m44.894s sys 0m5.240s ERROR: One or more genes had an error with SPAdes assembly. This may be due to low coverage. No contigs found for the following genes: gene293 gene026 gene002 gene079 gene006 gene012 gene030 WARNING: All Kmers failed for gene079! WARNING: All Kmers failed for gene030! Re-running SPAdes for 5 genes parallel --eta :::: redo_spades_commands.txt > spades_redo.log

* (process:18798): WARNING : Compiled with assertion checking - will run slowly * ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) /bin/sh: line 1: 18798 Aborted exonerate -m protein2genome --showalignment no --showvulgar no -V 0 --ryo ">%ti,%qi,%qab,%qae,%pi,(%tS),%tab,%tae\n%tcs\n" gene111/gene111_baits.fasta gene111/gene111_contigs.fasta > gene111/NZ874/exonerate_results.fasta There were 0 exonerate hits for gene111/gene111_baits.fasta.

real 0m0.347s user 0m1.085s sys 0m0.334s Generated sequences from 0 genes! WARNING: Potential paralogs detected for 0 genes!Making nucleotide bwa index in current directory.

mossmatters commented 8 years ago

The first warning from BioPython is normal, and can be ignored.

I have not seen the second warning before. It looks like it has to do with how the program Exonerate was compiled. For Linux, I typically download a pre-compiled binary from: http://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate

According to the README distributed with the source code, if you built Exonerate from source, the authors recommend:

make clean
 ./configure --disable-assert
make
make install

The --disable-assert flag should prevent the warning, and also the error. Please let me know if that works!

landstr0m commented 6 years ago

I also had the same error:

ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA)

System data:
SELinux, 4.16.9-200.fc27.x86_64 

exonerate version 2.4.0
Using glib version 2.53.4
Built on Aug 2 2017

I installed exonerate using sudo dnf install exonerate I wasn't sure how to apply the --disable-assert flag but by default it's not applied on the repository version.

Using the pre-compiled binary fixed both the warning and the error, as you suggested. :+1:

joelnitta commented 5 years ago

TL;DR

I had similar issues, but was able to fix them by installing the spades binary. I also made a docker image for running HybPiper that I think others may find useful:

https://hub.docker.com/r/joelnitta/hybpiper/

Long version

I am also trying to set up HybPiper on linux. I had a similar error, which I was unable to fix by installing exonerate either from pre-compiled binary or source:

Running SPAdes on 13 genes
time parallel --eta spades.py --only-assembler --threads 1 --cov-cutoff 8 --12 {}/{}_interleaved.fasta -o {}/{}_spades :::: spades_genelist.txt > spades.log

Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 0 AVG: 0.23s  local:0/13/100%/0.4s 
Command exited with non-zero status 13
8.15user 6.16system 0:04.51elapsed 316%CPU (0avgtext+0avgdata 269788maxresident)k
112inputs+12968outputs (2major+1592831minor)pagefaults 0swaps
ERROR: One or more genes had an error with SPAdes assembly. This may be due to low coverage. No contigs found for the following genes:
gene012
gene079
gene001
gene026
gene298
gene006
gene074
gene030
gene111
gene660
gene002
gene293
gene461
WARNING: All Kmers failed for gene012!
WARNING: All Kmers failed for gene079!
WARNING: All Kmers failed for gene001!
WARNING: All Kmers failed for gene026!
WARNING: All Kmers failed for gene298!
WARNING: All Kmers failed for gene006!
WARNING: All Kmers failed for gene074!
WARNING: All Kmers failed for gene030!
WARNING: All Kmers failed for gene111!
WARNING: All Kmers failed for gene660!
WARNING: All Kmers failed for gene002!
WARNING: All Kmers failed for gene293!
WARNING: All Kmers failed for gene461!
Re-running SPAdes for 0 genes
parallel --eta --timeout 400% :::: redo_spades_commands.txt > spades_redo.log

Computers / CPU cores / Max jobs to run
1:local / 4 / 1
ETA: 0s Left: 0 AVG: 0.00s  0
All redos completed successfully!
ERROR: No genes had assembled contigs! Exiting!

Inspecting the spades.log revealed an error -11, which apparently means spades didn't run at all. I was able to fix this by installing the spades pre-compiled binary.

Actually, the reason I'm trying to run this on linux is to setup a docker image. This should obviate the need to install all the dependencies by hand. The image doesn't include R, but successfully runs the rest of the tutorial:

https://hub.docker.com/r/joelnitta/hybpiper/

Please let me know if you find any problems with it by filing an issue on the repo.

mossmatters commented 5 years ago

Thanks Joel, this is great! I have been wanting to create a Singularity container for HybPiper (our cluster doesn't allow admin access which mostly kills the idea of using Docker). This should save me a bunch of time! I'm going to test it this week and see how it goes, I'll let you know.

joelnitta commented 5 years ago

Awesome! I've been meaning to learn Singularity myself for the exact same reason for another project, so I'm curious to compare notes when you get it running.

FYI, looks like this could be useful: https://github.com/singularityware/docker2singularity