zhangrengang / TEsorter

TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes
https://doi.org/10.1093/hr/uhac017
GNU General Public License v3.0
85 stars 19 forks source link

keyError #38

Closed ttbond closed 1 year ago

ttbond commented 1 year ago

Hi~ The following error asserted when I ran TEsorter on my fasta file.

Traceback (most recent call last):
  File "/data/home/xutun/miniconda3/envs/tt/bin/TEsorter", line 10, in <module>
    sys.exit(main())
  File "/data/home/xutun/miniconda3/envs/tt/lib/python3.6/site-packages/TEsorter/app.py", line 1014, in main
    pipeline(Args())
  File "/data/home/xutun/miniconda3/envs/tt/lib/python3.6/site-packages/TEsorter/app.py", line 167, in pipeline
    maxeval = args.max_evalue,
  File "/data/home/xutun/miniconda3/envs/tt/lib/python3.6/site-packages/TEsorter/app.py", line 919, in LTRlibAnn
    prefix=prefix, seqtype=seqtype, mincov=mincov, maxeval=maxeval)
  File "/data/home/xutun/miniconda3/envs/tt/lib/python3.6/site-packages/TEsorter/app.py", line 801, in hmm2best
    gseq = d_seqs[rc.qname].seq[rc.envstart-1:rc.envend]
KeyError: 's004_4:248929-251341(-)|aa1'

I noticed that special symbols in sequence names might lead to strange problems. But I have successfully applied TEsorter to other fasta files that contain simialr type of names and this fasta file was the only exception. I would be appreciate if you could give any early reply.

Sincerly, Tun Xu

zhangrengang commented 1 year ago

Could you please provide me a subset of your fasta file to reproduce the issue?

ttbond commented 1 year ago

Thanks a lot for your kind and timely reply!!! I'm trying to test a small subset that could reproduce the error. Is the email address iin this issue still available for sending the fasta file ? https://github.com/zhangrengang/TEsorter/issues/35#issuecomment-1127685358

zhangrengang commented 1 year ago

@ttbond Yes, or directly via this GitHub comment.

ttbond commented 1 year ago

Thanks again for your reply. The error disapeared after removing the previous output files. In the failed tests, I only removed the temporary working directory (setted by the '-tmp' parameter), and the error disapeared after I removed all the files with prefix setted in the '-pre' parameter.

zhangrengang commented 1 year ago

Yes, the previous .domtbl file will be re-used (because hmmscan is slow) and lead a inconsistency if the inputs are different. You need to use different -pre and -tmp or set --force-write-hmmscan.