Closed oushujun closed 1 year ago
Hi Shujun, Please update TEsorter (the lastest conda version is v1.4.6). With the version 1.4, it is ok:
2023-05-03 15:37:42,995 -INFO- Command: /media/40T/wlx/zrg/share/home/app/.local/python3/bin/TEsorter test.fa
2023-05-03 15:37:42,996 -INFO- VARS: {'sequence': 'test.fa', 'hmm_database': 'rexdb', 'db_hmm': None, 'seq_type': 'nucl', 'prefix': None, 'force_write_hmmscan': False, 'processors': 4, 'tmp_dir': './tmp-63472734-e985-11ed-8106-4cd98fb9bbe7', 'min_coverage': 20, 'max_evalue': 0.001, 'min_probability': 0.5, 'no_cleanup': False, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'genome': False, 'win_size': 270000, 'win_ovl': 30000, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0}
2023-05-03 15:37:42,996 -INFO- checking dependencies:
2023-05-03 15:37:43,098 -INFO- hmmer 3.3 OK
2023-05-03 15:37:44,391 -INFO- blastn 2.13.0+ OK
2023-05-03 15:37:44,392 -INFO- check database /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2023-05-03 15:37:44,392 -INFO- db file: /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2023-05-03 15:37:44,392 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK
2023-05-03 15:37:44,392 -INFO- Start classifying pipeline (ELEMENT mode)
2023-05-03 15:37:44,472 -INFO- total 1 sequences
2023-05-03 15:37:44,473 -INFO- HMM scanning against `/media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm`
2023-05-03 15:37:44,521 -INFO- Start Pool with 4 process(es)
2023-05-03 15:37:44,522 -INFO- translating `./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta` in six frames
/media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/Bio/Seq.py:2609: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
2023-05-03 15:37:44,627 -INFO- Start Pool with 4 process(es)
2023-05-03 15:37:44,628 -INFO- run CMD: `hmmscan --nobias --notextw --noali --domtblout ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta.aa.domtbl /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta.aa > /dev/null`
2023-05-03 15:37:44,829 -INFO- generating gene anntations
2023-05-03 15:37:44,830 -INFO- 1 sequences classified by HMM
2023-05-03 15:37:44,830 -INFO- see protein domain sequences in `test.fa.rexdb.dom.faa` and annotation gff3 file in `test.fa.rexdb.dom.gff3`
2023-05-03 15:37:44,831 -INFO- classifying the unclassified sequences by searching against the classified ones
2023-05-03 15:37:44,886 -INFO- using the 80-80-80 rule
2023-05-03 15:37:44,886 -INFO- run CMD: `makeblastdb -in ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa -dbtype nucl -out ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa`
2023-05-03 15:37:46,226 -INFO- run CMD: `blastn -query ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_unclassified.fa -db ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa -out ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 4 `
2023-05-03 15:37:47,403 -INFO- 0 sequences classified in pass 2
2023-05-03 15:37:47,404 -INFO- total 1 sequences classified.
2023-05-03 15:37:47,404 -INFO- see classified sequences in `test.fa.rexdb.cls.tsv`
2023-05-03 15:37:47,404 -INFO- writing library for RepeatMasker in `test.fa.rexdb.cls.lib`
2023-05-03 15:37:47,404 -INFO- writing classified protein domains in `test.fa.rexdb.cls.pep`
2023-05-03 15:37:47,404 -INFO- Summary of classifications:
Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains
DIRS unknown 1 0 0 0
2023-05-03 15:37:47,404 -INFO- cleaning the temporary directory ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7
2023-05-03 15:37:47,405 -INFO- Pipeline done.
I think the issue might be due to preexisting TEsorter results matching the input name but the input content has been changed, so the existing results do not match with the new content. After deleting those results, TEsorter runs normally.
Yes, mismatches of sequence ID between input fasta and preexisting hmmscan outfile (*.domtbl
) can result in this issue. It can be avoided by deleting the *.domtbl
file or using -fw
option. I will add a warning for this issue.
Hi Ren-Gang,
I encountered errors when running TEsorter (v1.3) on sequences with special headers. Here is one reproducible example:
Save the above sequence in
test.fa
and run:TEsorter test.fa