qjiangzhao / TEtrimmer

TEtrimmer: a novel tool to automate manual curation of transposable elements
GNU General Public License v3.0
43 stars 1 forks source link

All input sequences skipped due to 0 blast hits #27

Open Jennifer282 opened 1 week ago

Jennifer282 commented 1 week ago

Thank you for making this tool available, we have previously used it to curate TE's for one of our species of interest and is has worked great. However, we are now attempting to use it for another species and all input sequences from RepeatModeler are being skipped due to 0 blast. The TEtrimmer_consensus.fasta is not being created. We also downloaded the "conda install conda-forge::ghostscript", as was suggested in a similar issue #26 but still got the same results. We also blasted some of the sequences from RepeatModeler and got blast hits to our genome so the 0 blast hits don't seem reasonable. We ran the test data provided and it worked well so we are not sure what the problem might be. This is what our output file says:

All sequences have been analysed! In the analysed sequences 4452 are skipped. Note: not all skipped sequences can have TE Aid plot in the 'TEtrimmer_for_proof_annotation' folder. In the analysed sequences 0 are identified as low copy TE.

TEtrimmer is doing the final classification. It uses the classified TE to classify Unknown elements.

The final classification module failed.

This does not affect the final TE consensus sequences You can choose to ignore this error.

TEtrimmer is removing sequence duplications. This might take long time when many sequencesare included into the final consensus library. Please be patient!

cd-hit-est failed for TEtrimmer_consensus.fasta with error code 1

Fatal Error: Failed to open the database file Program halted !!

The final CD-HIT-EST merge step cannot be performed. Final TE consensus library redundancy can be higher but the sensitivity is not affected. You can remove duplicated sequence by yourself.

You can choose to ignore CD-HIT-EST error. For traceback output, please refer to 'error_file.txt' in the 'Multiple_sequence_alignment' directory.

TEtrimmer is clustering TE consensus library. This can potentially take long time when many sequences exist in the consensus library. Please be patient!

Final clustering of proof annotation files failed with error local variable 'sequence_info' referenced before assignment

Traceback (most recent call last): File "/home/FCAM/arivera/TEtrimmer/tetrimmer/TEtrimmer.py", line 529, in main sequence_info, perfect_proof, good_proof, intermediate_proof, need_check_proof) UnboundLocalError: local variable 'sequence_info' referenced before assignment

This does not affect the final TE consensus sequences. But this can heavily complicate the TE proof annotation. If you don't plan to do proof annotation, you can choose to ignore this error.

error_file.txt

qjiangzhao commented 1 week ago

Hi,

Could you provide your "Summary.txt" file?

You can add "--debug" option when you run TEtrimmer for several of your input sequences (For example, only taking 5 sequences out from your input fasta file as a new input. Becasue "--debug" option will generate many files). Then you can check the "Multiple_sequence_alignment" folder if you can find the "BLASTN" file.

Yours Jiangzhao

qjiangzhao commented 1 week ago

And please make sure you used the right genome path first.

Jennifer282 commented 1 week ago

Thank you for the speedy reply. Paths to both the genome file and the RepeatModeler file have been checked and are correct. summary.txt

qjiangzhao commented 1 week ago

No worreis! This the BLASTN code used by TEtrimmer:

blastn -query {input_file} -db {database_path} \ -outfmt "6 qseqid sseqid pident length mismatch qstart qend sstart send sstrand evalue qcovhsp" \ -evalue 1e-40 -qcov_hsp_perc 20

Theoritically, if you can get BLASTN hits by this, you should get resutls from TEtrimmer.

You can try to delete blast database files locating at the same directory with your genome and run TEtrimmer again.

Jennifer282 commented 1 week ago

Deleting the blast database seems to have worked! The job is still running so I'll update you once it's done but the "BLASTN" files inside the Multiple_sequence_alignment dir are not empty like they were in the previous run. The blast database files might have been corrupted. Thank you!