zhangrengang / TEsorter

TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes
https://doi.org/10.1093/hr/uhac017
GNU General Public License v3.0
87 stars 19 forks source link

Target sequence length > 100K #30

Closed yongzhiyang2012 closed 2 years ago

yongzhiyang2012 commented 2 years ago

Hi Rengang, I have got the following error, and it seems like my sequence is too long? How to fix it, Split my genome or change the script?

STDOUT:
b''
STDERR:
b'Fatal exception (source file p7_pipeline.c, line 697):\nTarget sequence length > 100K, over comparison pipeline limit.\n(Did you mean to use nhmmer/nhmmscan?)\n'

2022-02-28 12:05:01,977 -WARNING- exit code -6 for CMD 'hmmscan --notextw -E 0.01 --domE 0.01 --noali --domtblout ./tmp/chunk_aaseq.2.fasta.domtbl /data/01/user106/software/anaconda/anaconda3/envs/tesorter/lib/ python3.5/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0.hmm ./tmp/chunk_aaseq.2.fasta' 2022-02-28 12:05:01,977 -WARNING-
STDOUT:
b''

zhangrengang commented 2 years ago

Please use TE or LTR sequences instead of genome sequences. Here are examples to extract TE sequences from Repeatmakser and LTR_retriver outputs.

zhangrengang commented 2 years ago

The error:

Fatal exception (source file p7_pipeline.c, line 697):
Target sequence length > 100K, over comparison pipeline limit.
(Did you mean to use nhmmer/nhmmscan?)

is owned to the HMMer since v3.3.1 which has a design limit of 100K:

The comparison engine used by hmmsearch, hmmscan, phmmer, and jackhmmer has a design limit of 100K residues for target sequence length (usually protein sequences, sometimes RNA transcripts). Only nhmmer/nhmmscan are designed for searching arbitrary length genome sequences. That limit was not being enforced, and it was possible for a user to run hmmsearch inadvertently instead of nhmmer, which can cause numerical overflows and give infinite scores, among other problems. The comparison engine now exits with an error if the target sequence length is >100K.
yongzhiyang2012 commented 2 years ago

Thank you. I have deleted the sequences that were longer than 100k and it worked.

zhangrengang commented 2 years ago

HMMer v3.3 can work with sequence length > 100K.