Closed pedroh3ringer closed 2 weeks ago
Hi @pedroh3ringer,
You can try to delete the BLASTN database for the test genome and try it again. If the error still exists, please send me your entire test output folder and I will have another look for the potential problems.
Yours sincerely Jiangzhao
Hi Jiangzhao,
Thanks a lot for the quick response! I deleted the BLASTN database for the test genome and tried again, but unfortunately, I got the same message and output as the one mentioned above. I’m sending the entire test output folder attached in this message.
Best, Pedro
Hi @pedroh3ringer:
It seems the error is caused by the python package "pypdf2", you can try to solve this by
mamba install conda-forge::pypdf2
or
mamba update pypdf2
in your terminal.
If that don't help, please download the new release version from TEtrimmer GitHub and run it again. The new version should be able to provide more error information.
Yours sincerely Jiangzhao
I had the same issue @qjiangzhao @pedroh3ringer, and tracked it down to PyPDF indeed:
Traceback (most recent call last):
File "/home/adminbrice/Softs/miniforge3/envs/tetrimmer/share/tetrimmer/boundarycrop.py", line 841, in find_boundary_and_crop
scale_dotplot_pdf = scale_single_page_pdf(dotplot_pdf, f"{dotplot_pdf}_su.pdf", scale_ratio=2)
File "/home/adminbrice/Softs/miniforge3/envs/tetrimmer/share/tetrimmer/functions.py", line 1988, in scale_single_page_pdf
pdf_reader = PdfFileReader(input_pdf_path)
File "/home/adminbrice/Softs/miniforge3/envs/tetrimmer/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1974, in __init__
deprecation_with_replacement("PdfFileReader", "PdfReader", "3.0.0")
File "/home/adminbrice/Softs/miniforge3/envs/tetrimmer/lib/python3.10/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
File "/home/adminbrice/Softs/miniforge3/envs/tetrimmer/lib/python3.10/site-packages/PyPDF2/_utils.py", line 351, in deprecation
raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
Run on the test data worked after pip install 'PyPDF2<3.0'
In the process of debugging this I spotted an issue here: https://github.com/qjiangzhao/TEtrimmer/blob/314a9e86fc504398c343d6f49502c2a8fc648299/tetrimmer/boundarycrop.py#L843-L849
Line 848 refers to e
that does not exist, you need except Exception as e
line 843. Because that raises an Exception, you never skip this part, and never get to writing out final_con_file
on line 1088, which is the missing TEtrimmer_consensus.fasta
file required by cd-hit
.
Best ;)
Dear @bricoletc
Thanks for your feedback. Then I will close this issue.
Many thanks for your debugging. I have modified the code and will push it along with the next main update.
Yours sincerely Jiangzhao
Good news, I guess it's worth pinning pypdf
on bioconda? https://github.com/bioconda/bioconda-recipes/blob/master/recipes/tetrimmer/meta.yaml
Or updating the call to it in your code
To avoid this issue altogether!
Yes, many thanks again. we will update the TEtrimmer Conda package next month and will do that!
Hi,
I tried to run TEtrimmer with the test set, using the command:
TEtrimmer --input_file test_input.fa --genome_file test_genome.fasta --output_dir test_output --num_threads 20 --classify_all
Which generated the expected output directories. However, the directories within 'TEtrimmer_for_proof_curation' are empty and the output message was:
TE Trimmer is modifying sequence names; any occurrence of '/', '-', ':', '...', '|' and empty spaces before '#' will be converted to '_'. You can find the original and modified names in the 'Sequence_name_mapping.txt' file in the output directory.
TEtrimmer detected instances of '#' in your input FASTA sequence headers. The string before '#' is denoted as the seq_name, and the string after '#' is denoted as the TE type.
Finish to generate single sequence files.
8 sequences are detected from the input file Progress: |--------------------------------------------------| 0/8 = 0.0% Complete
rnd_6_family_3291 is skipped due to blast hit number is 0
7 sequences have not been analysed. In the analysed sequences 1 are skipped. Note: not all skipped sequences can have TE Aid plot in the 'TEtrimmer_for_proof_curation' folder. In the analysed sequences 0 are identified as low copy TE.
You might find the reasons why some sequences were not analysed from the 'error_file.txt' in the 'Multiple_sequence_alignment' directory.
Less than 30% TE are classified, TEtrimmer won't classify 'Unknown' TE by classified TE.
TEtrimmer is removing sequence duplications. This might take long time when many sequencesare included into the final consensus library. Please be patient!
cd-hit-est failed for TEtrimmer_consensus.fasta with error code 1
Fatal Error: Failed to open the database file Program halted !!
The final CD-HIT-EST merge step cannot be performed. Final TE consensus library redundancy can be higher but the sensitivity is not affected. You can remove duplicated sequence by yourself.
You can choose to ignore CD-HIT-EST error. For traceback output, please refer to 'error_file.txt' in the 'Multiple_sequence_alignment' directory.
TEtrimmer is clustering TE consensus library. This can potentially take long time when many sequences exist in the consensus library. Please be patient!
Final clustering of proof curation files failed with error local variable 'sequence_info' referenced before assignment
Traceback (most recent call last): File "/home/pedro/miniconda3/envs/TEtrimmer/share/tetrimmer/TEtrimmer.py", line 533, in main sequence_info, perfect_proof, good_proof, intermediate_proof, need_check_proof) UnboundLocalError: local variable 'sequence_info' referenced before assignment
This does not affect the final TE consensus sequences. But this can heavily complicate the TE proof curation. If you don't plan to do proof curation, you can choose to ignore this error.
Progress: |██████--------------------------------------------| 1/8 = 12.5% Complete
This message is somewhat similar to the one reported in this issue: https://github.com/qjiangzhao/TEtrimmer/issues/27 However, in my case, the issue was with the test set and not the actual data that I want to analyze, so I think the issue could be different. Thanks in advance for your help!