qjiangzhao / TEtrimmer

TEtrimmer: a novel tool to automate manual curation of transposable elements
GNU General Public License v3.0
62 stars 2 forks source link

Final clustering of proof annotation files failed #26

Closed CEPHAS-01 closed 3 months ago

CEPHAS-01 commented 3 months ago

Hi @qjiangzhao

Thanks for making this tool available. I encountered the following error while making use of the tool for TE annotation of an animal genome.

error_file.txt

Final CD-HIT-EST deduplication error. Traceback (most recent call last): File "/TEtrimmer/tetrimmer/functions.py", line 1234, in cd_hit_est subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) File "/anaconda3/envs/mamba/envs/TEtrimmer/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['cd-hit-est', '-i', 'curate_v1/curate_v2/Tetrimmer/outPutDir/TEtrimmer_consen sus.fasta', '-o', 'curate_v1/curate_v2/Tetrimmer/outPutDir/Classification_and_deduplication/TEtrimmer_consensus_merged _round1.fasta', '-c', '0.9', '-aL', '0', '-aS', '0.9', '-M', '0', '-T', '120', '-l', '30', '-d', '0', '-s', '0', '-sc', '1']' returned non-zero exit status 1 . During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/TEtrimmer/tetrimmer/TEtrimmer.py", line 507, in main sequence_info = analyze.merge_cons(classification_dir, final_con_file, progress_file, cd_hit_est_final_merged, num_threads)# Do first round of CD-HIT-EST File "/TEtrimmer/tetrimmer/analyze.py", line 357, in merge_cons cd_hit_est(final_con_file, cd_hit_merge_output_round1, identity_thr=0.9, aL=0, aS=0.9, s=0, thread=num_threads) File "/TEtrimmer/tetrimmer/functions.py", line 1243, in cd_hit_est raise Exception Exception Final clustering of proof annotation files failed. Traceback (most recent call last): File "/TEtrimmer/tetrimmer/TEtrimmer.py", line 530, in main analyze.cluster_proof_anno_file(multi_dotplot_dir, final_con_file_no_low_copy, continue_analysis, cluster_proof_anno_dir, num_threads, sequence_info, per fect_proof, good_proof, intermediate_proof, need_check_proof) UnboundLocalError: local variable 'sequence_info' referenced before assignment

Please help look into this.

Thanks.

OT

qjiangzhao commented 3 months ago

Hi OT,

Thanks you are using TEtrimmer!

You can try "cd-hit-est" directly on your server terminal like:

cd-hit-est -i curate_v1/curate_v2/Tetrimmer/outPutDir/TEtrimmer_consensus.fasta \ -o curate_v1/curate_v2/Tetrimmer/outPutDir/TEtrimmer_consensus_cd_hit_est_test.fasta \ -c 0.9 -aL 0 -aS 0.9 -M 0 -T 20 -l 30 -d 0 -s 0 -sc 1

The error might due to the improper configuration of cd-hit.

Yours sincerely JInagzhao

CEPHAS-01 commented 3 months ago

Hello @qjiangzhao,

Thank you for your prompt response. I do not have the TEtrimmer_consensus.fasta in the outPutDir folder, and it is not located anywhere within the outPutDir folder structure. Perhaps the analysis did not get to the point of producing this file.

I suspect that the main problem is about this part of the error message:

File "/TEtrimmer/tetrimmer/TEtrimmer.py", line 530, in main analyze.cluster_proof_anno_file(multi_dotplot_dir, final_con_file_no_low_copy, continue_analysis, cluster_proof_anno_dir, num_threads, sequence_info, per fect_proof, good_proof, intermediate_proof, need_check_proof) UnboundLocalError: local variable 'sequence_info' referenced before assignment

qjiangzhao commented 3 months ago

Hi OT:

Have your run the provided test files? Please do that first to see if you can get the "TEtrimmer_consensus.fasta" file. This file is required for the further cd-hit-est clustering analysis.

Yours sincerely Jiangzhao

CEPHAS-01 commented 3 months ago

I have not run the test file, let me do just that. Thanks.

CEPHAS-01 commented 3 months ago

Hi @qjiangzhao ,

Running the test files did not produce the "TEtrimmer_consensus.fasta" file.

The same error was reported in the log file:

Traceback (most recent call last): File "TEtrimmer/tetrimmer/TEtrimmer.py", line 533, in main analyze.cluster_proof_anno_file(multi_dotplot_dir, final_con_file_no_low_copy, continue_analysis, cluster_proof_anno_dir, num_threads, sequence_info, per fect_proof, good_proof, intermediate_proof, need_check_proof) UnboundLocalError: local variable 'sequence_info' referenced before assignment

qjiangzhao commented 3 months ago

Besides the error message. Could you send me the entire terminal content when you run the test file?

CEPHAS-01 commented 3 months ago

Sure. Here it is. stdout_384266.txt

qjiangzhao commented 3 months ago

You can try "conda install conda-forge::ghostscript" to solve this problem. Otherwise, you can add "--debug" option when you run TEtrimmer and send me all the log and output files. Then I can have another look.

CEPHAS-01 commented 3 months ago

The "TEtrimmer_consensus.fasta" file was produced after installing the conda package you advised. I will try this on my data and revert. Thanks!

qjiangzhao commented 3 months ago

No worris!

CEPHAS-01 commented 3 months ago

This finally ran on my data and produced results. Thank you for your help.