tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
801 stars 221 forks source link

Trouble annotating with a custom hmm db #257

Closed vnsriniv closed 6 years ago

vnsriniv commented 6 years ago

Hi

I am using Prokka to annotate a set of genomes using a custom nucleotide hmm database I built using hmmer (hmmbuild). I used the following pipeline to do this

  1. Create hmms for each gene using hmmbuild
  2. Concatenate the hmms for each gene and compress and index them using hmmpress
  3. Moved the hmm files to the prokka/db/hmm folder.
  4. Ran prokka --setupdb and confirmed that the hmms were recognized using prokka --listdb (output below). polyp_hmms is the database that I added.
    [19:01:47] * Kingdoms: Archaea Bacteria Mitochondria Viruses
    [19:01:47] * Genera: Enterococcus Escherichia Staphylococcus
    [19:01:47] * HMMs: HAMAP polyp_hmms
    [19:01:47] * CMs: Bacteria Viruses
  5. Ran prokka on my genomes using the following command prokka --prefix prokka_genome_name --cpus 2 --metagenome --force genome_name.fa.

I am getting the following error when I run the above command

[18:54:15] Running: cat prokka_Competibacter_denitrificans_McIlroy2014\/polyp_hmms\.hmm\.faa | parallel --gnu --plain -j 2 --block 93488 --recstart '>' --pipe hmmscan --noali --notextw --acc -E 1e-06 --cpu 1 /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm /dev/stdin > prokka_Competibacter_denitrificans_McIlroy2014\/polyp_hmms\.hmm\.hmmer3 2> /dev/null  

[18:54:15] Could not run command: cat prokka_Competibacter_denitrificans_McIlroy2014\/polyp_hmms\.hmm\.faa | parallel --gnu --plain -j 2 --block 93488 --recstart '>' --pipe hmmscan --noali --notextw --acc -E 1e-06 --cpu 1 /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm /dev/stdin > prokka_Competibacter_denitrificans_McIlroy2014\/polyp_hmms\.hmm\.hmmer3 2> /dev/null

Is my pipeline for including custom hmms correct? Why might this be happening?

Thanks for your help.

Varun

vnsriniv commented 6 years ago

I ran a test with the --debug option and it seems that it seems to fail at the hmmscan step based on the polyp_hmms.hmm.hmmer3 file. The .hmmer3 file for HAMAP.hmm seems to have results based on hmm hits while the polyp_hmms.hmm.hmmer3 only shows repeated tries to run hmmscan. Here are the contents of the polyp_hmms.hmm.hmmer3

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:             /dev/stdin
# target HMM database:             /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm
# prefer accessions over names:    yes
# show alignments in output:       no
# max ASCII text line length:      unlimited
# profile reporting threshold:     E-value <= 1e-06
# number of worker threads:        1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:             /dev/stdin
# target HMM database:             /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm
# prefer accessions over names:    yes
# show alignments in output:       no
# max ASCII text line length:      unlimited
# profile reporting threshold:     E-value <= 1e-06
# number of worker threads:        1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:             /dev/stdin
# target HMM database:             /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm
# prefer accessions over names:    yes
# show alignments in output:       no
# max ASCII text line length:      unlimited
# profile reporting threshold:     E-value <= 1e-06
# number of worker threads:        1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:             /dev/stdin
# target HMM database:             /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm
# prefer accessions over names:    yes
# show alignments in output:       no
# max ASCII text line length:      unlimited
# profile reporting threshold:     E-value <= 1e-06
# number of worker threads:        1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:             /dev/stdin
# target HMM database:             /home/vnsriniv/local/prokka/bin/../db/hmm/polyp_hmms.hmm
# prefer accessions over names:    yes
# show alignments in output:       no
# max ASCII text line length:      unlimited
# profile reporting threshold:     E-value <= 1e-06
# number of worker threads:        1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
tseemann commented 6 years ago

@vnsriniv Maybe it is the GNU parallel part of the command that is failing. What does parallel --version say?

Also, can you run hmmscan manually on some protein sequences to see if that is working ok?

vnsriniv commented 6 years ago

@tseemann Sorry for the very delayed response. I haven't been able to replicate this issue. So I am going to close this issue as of now and will get back to it if it pops up again!!

Thanks for your response!

tseemann commented 6 years ago

@vnsriniv i only just noticed your original post says "nucleotide hmm database"... nucleotide won't work, it needs to be protein.

vnsriniv commented 6 years ago

My bad!! Thanks! Good to know!