tseemann / barrnap

:microscope: :leo: Bacterial ribosomal RNA predictor
GNU General Public License v3.0
221 stars 40 forks source link

nhmmer failed to run - Error: Invalid alphabet type in target for nhmmer. Expect DNA or RNA #54

Open minjinhan opened 3 years ago

minjinhan commented 3 years ago

Hello I try to run barrnap to identify rRNA from a eukaryotic genome , the commad as follow: barrnap --kingdom euk --threads 20 --outseq rRNA.fasta < chr1.fasta

After running, we got following error . Can you supply suggestions to solve this problem? Thanks! [barrnap] This is barrnap 0.9 [barrnap] Written by Torsten Seemann [barrnap] Obtained from https://github.com/tseemann/barrnap [barrnap] Detected operating system: linux [barrnap] Adding /miniconda3/lib/barrnap/bin/../binaries/linux to end of PATH [barrnap] Checking for dependencies: [barrnap] Found nhmmer - /miniconda3/bin/nhmmer [barrnap] Found bedtools -/miniconda3/bin/bedtools [barrnap] Will use 20 threads [barrnap] Setting evalue cutoff to 1e-06 [barrnap] Will tag genes < 0.8 of expected length. [barrnap] Will reject genes < 0.25 of expected length. [barrnap] Using database: /miniconda3/lib/barrnap/bin/../db/euk.hmm [barrnap] Scanning chr1.fasta for euk rRNA genes... please wait [barrnap] Command: nhmmer --cpu 20 -E 1e-06 --w_length 3878 -o /dev/null --tblout /dev/stdout '/miniconda3/lib/barrnap/bin/../db/euk.hmm' 'chr1.fasta' [barrnap] ERROR: nhmmer failed to run - Error: Invalid alphabet type in target for nhmmer. Expect DNA or RNA.

I am sure there are no other alphabets in the fasta sequence except A/T/C/G.

snayfach commented 3 years ago

I've gotten the same error. For me, what caused the error was one sequence composed entirely of G and T nucleotides. Adding a single A and C nucleotide resulted in no error. This should be an easy :-)

jdwinkler-lanzatech commented 3 years ago

I also just ran into this problem as well.

zxgsy520 commented 2 years ago

I also just ran into this problem as well. I added A and same problem. The Internet said it was a problem with the conda installation.

ptrebert commented 2 years ago

I just stumbled upon this; in case this is still relevant @zxgsy520 there is a switch to set the alphabet type for the query and use this as "guide" in case the alphabet type cannot be guessed for the target; --dna introduced in this PR https://github.com/EddyRivasLab/hmmer/pull/252 The switch is available in nhmmer v3.3.2 installed via conda

correction: the fix in the PR has only been merged into the dev branch, the switch --dna exists in latest release but does not include the fix

ZeweiSong commented 2 years ago

I bypassed this issue by replacing all ambiguous bases (M, K, H, et al.) to N.

cabbagesofdoom commented 1 year ago

I got this issue for a genome that started with a telomere repeat and did not have all four bases in the first few hundred characters. I got around it by replacing the first four characters of each sequence with GATC and then running on the temporary file:

PREFIX=$(basename ${GENOME/.fasta/})
sed 's/^[ACGT][ACGT][ACGT][ACGT]/GATC/' $GENOME > $PREFIX.tmp.fasta
HansongYan666 commented 2 months ago

hello everyone: I met the same error, cab's @cabbagesofdoom method is work, just replace 4 base, I suggest to release the hmmer version, first, scan the hmm file in barrnap db directory, for example euk.hmm, the hmm file has a version info in the header, uninstall the hmmer and install the same version hmmer, then it works on the sample genome. tips: my genome only with bases ATCGN. you can replace other base first.