rhysf / Synima

Synteny Imager
MIT License
60 stars 8 forks source link

Orthofinder issue #45

Open pcampiteli opened 6 months ago

pcampiteli commented 6 months ago

Greetings I'm a PhD student, working on visualization of synteny across genomes of strains of the same species (12 different synteny plots), and one synteny across different species genomes (up to 35 species). The synima pipeline is great and works to produce the plots.

In the third step, which establishes the orthology, I'm using the orthofinder script; I noticed that In some analyses, despite the orthology being established the analysis doesn't finish properly and there is missing information. [proper orthofinder analysis] synima_properly [error orthofinder analysis] synima_not_properly I don't have the log from each analysis, and as far as I remember, the failed analysis always has some warning with one or more Orthogroup gene trees, but no other information is given.

Has anyone had the same problem or know how to identify the problem or have any ideas on how to avoid this issue? Thanks in advance.

pcampiteli commented 6 months ago

Greetings I've saved the analysis log on a species that the orthofinder analysis did not worked properly

This is the log

Parsing repo spec... Assigning genome codes... Assigning sequence IDs and save FASTA... fasta_to_struct: saving from ./TatrJCM9410/TatrJCM9410.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrLY357/TatrLY357.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrCG6828/TatrCG6828.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrIMI2060402/TatrIMI2060402.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrXS2015/TatrXS2015.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrP1/TatrP1.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrSC1/TatrSC1.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./TatrIMI206040/TatrIMI206040.annotation.pep.synima-parsed.PEP... fasta_to_struct: saving from ./Tatr0020/Tatr0020.annotation.pep.synima-parsed.PEP... retrieve_blast_pairs_orthofinder... retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast8_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast8_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast2_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast8_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast7_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast8_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast3_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast8_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast0_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast8_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast1_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast8_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast5_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast8_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast6_8.txt retrieve_hits_orthofinder: ./Tatr0020/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast8_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_Tatr0020.blast.m8 -> Orthofinder_outdir/Blast4_8.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast2_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast2_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast7_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast2_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast3_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast2_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast0_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast2_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast1_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast2_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast5_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast2_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast6_2.txt retrieve_hits_orthofinder: ./TatrCG6828/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast2_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrCG6828.blast.m8 -> Orthofinder_outdir/Blast4_2.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast7_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast7_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast3_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast7_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast0_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast7_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast1_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast7_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast5_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast7_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast6_7.txt retrieve_hits_orthofinder: ./TatrIMI206040/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast7_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrIMI206040.blast.m8 -> Orthofinder_outdir/Blast4_7.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast3_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast3_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast0_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast3_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast1_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast3_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast5_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast3_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast6_3.txt retrieve_hits_orthofinder: ./TatrIMI2060402/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast3_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrIMI2060402.blast.m8 -> Orthofinder_outdir/Blast4_3.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast0_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast0_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast1_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast0_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast5_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast0_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast6_0.txt retrieve_hits_orthofinder: ./TatrJCM9410/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast0_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrJCM9410.blast.m8 -> Orthofinder_outdir/Blast4_0.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast1_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast1_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast5_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast1_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast6_1.txt retrieve_hits_orthofinder: ./TatrLY357/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast1_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrLY357.blast.m8 -> Orthofinder_outdir/Blast4_1.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast5_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast5_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast6_5.txt retrieve_hits_orthofinder: ./TatrP1/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast5_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrP1.blast.m8 -> Orthofinder_outdir/Blast4_5.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast6_6.txt retrieve_hits_orthofinder: ./TatrSC1/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast6_4.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrSC1.blast.m8 -> Orthofinder_outdir/Blast4_6.txt retrieve_hits_orthofinder: ./TatrXS2015/RBH_blast_PEP/vs_TatrXS2015.blast.m8 -> Orthofinder_outdir/Blast4_4.txt Running Orthofinder... Traceback (most recent call last): File "orthofinder/orthofinder.py", line 1610, in File "orthofinder/orthofinder.py", line 1429, in GetOrthologues File "orthofinder/scripts/orthologues.py", line 1097, in OrthologuesWorkflow File "orthofinder/scripts/stride.py", line 505, in GetRoot File "multiprocessing/pool.py", line 250, in map File "multiprocessing/pool.py", line 554, in get scripts.newick.NewickError: Unexisting tree file or Malformed newick tree structure. Failed to execute script orthofinder

OrthoFinder version 2.2.7 Copyright (C) 2014 David Emms

2024-05-17 11:12:21 : Starting OrthoFinder 16 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/SimpleTest.phy -o /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/SimpleTest.tre" - ok Using previously calculated BLAST results in /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/

Running OrthoFinder algorithm

2024-05-17 11:12:22 : Initial processing of each species 2024-05-17 11:12:37 : Initial processing of species 0 complete 2024-05-17 11:12:53 : Initial processing of species 1 complete 2024-05-17 11:13:08 : Initial processing of species 2 complete 2024-05-17 11:13:22 : Initial processing of species 3 complete 2024-05-17 11:13:36 : Initial processing of species 4 complete 2024-05-17 11:13:54 : Initial processing of species 5 complete 2024-05-17 11:14:09 : Initial processing of species 6 complete 2024-05-17 11:14:26 : Initial processing of species 7 complete 2024-05-17 11:14:42 : Initial processing of species 8 complete 2024-05-17 11:15:00 : Connected putatitive homologs 2024-05-17 11:16:48 : Ran MCL

Writing orthogroups to file

Orthogroups have been written to tab-delimited files: /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/Orthogroups.csv /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/Orthogroups.txt (OrthoMCL format) /storage4/h.paulocampiteli/pangenome/synima/atroviride/Orthofinder_outdir/Orthogroups_UnassignedGenes.csv 2024-05-17 11:16:53 : Done orthogroups

Analysing Orthogroups

Calculating gene distances

2024-05-17 11:19:32 : Done 0 of 10272 2024-05-17 11:19:42 : Done 1000 of 10272 2024-05-17 11:19:54 : Done 2000 of 10272 2024-05-17 11:20:07 : Done 3000 of 10272 2024-05-17 11:20:17 : Done 4000 of 10272 2024-05-17 11:20:30 : Done 5000 of 10272 2024-05-17 11:20:44 : Done 6000 of 10272 2024-05-17 11:20:58 : Done 7000 of 10272 2024-05-17 11:21:11 : Done 8000 of 10272 2024-05-17 11:21:25 : Done 9000 of 10272 2024-05-17 11:21:38 : Done 10000 of 10272 2024-05-17 11:19:28 : Done

Inferring gene and species trees

OG0009748_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored OG0008970_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored OG0009920_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored OG0009798_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored OG0003812_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored 7845 trees had all species present and will be used by STAG to infer the species tree

Best outgroup(s) for species tree

2024-05-17 11:27:27 : Starting STRIDE

Finished with: 65280

rhysf commented 6 months ago

Dear @pcampiteli

Thanks for submitting the report. 35 species is quite a lot to be comparing at once. The issues with Orthofinder are not one i've come across before, and very well could be an issue with that tool/software rather than Synima specifically, which would therefore be outside my ability to problem solve. It does however look like despite the error, it has completed, so perhaps there was a malformed blast or something - and perhaps you are able to continue with the synima pipeline regardless?

Also, this could be an issue with the version of Orthofinder (2.2.7) included with Synima, and you could try swapping it out for a newer version.

Another possible issue i am aware that may or may not be relevant, is that Synima is unable to deal with 2 genes or proteins that have the same name in different assemblies. So, if you are comparing as many as 35, and any of them have generic names like g123, then that will likely cause issues at some point, possibly here.

My suggestion would be to first try and carry on with the pipeline regardless, alternatively use OrthoMCL instead, or swap Orthofinder for a newer version, or split up your analysis based on phylogenetic information - perhaps based on the orthologs you have already identified.

If you find out what the issue is or work around, it would be great if you could post it here - apologies for not having more advice on this one.

pcampiteli commented 6 months ago

the 35 species analysis is a pangenome like analysis, and yes its a lot.. But I'm conducting other analyses with less genomes. All of them I able to finish the synima pipeline anaysis, but as I mentioned some are facing this orthofinder issue. The orhoMLC works but I'm interested in the orthofinder outputs. Its more reliable to stablish orthology and gives me other relevant information I'm using in my analysis. In the log it is possible to view that this is the main issue in my analysis. And I need to check the script lines to understand what is going on. My mais issue is that the orhtofinder script in the support scripts folder is .linux and opens a file that isn't human redable.

"Running Orthofinder... Traceback (most recent call last): File "orthofinder/orthofinder.py", line 1610, in File "orthofinder/orthofinder.py", line 1429, in GetOrthologues File "orthofinder/scripts/orthologues.py", line 1097, in OrthologuesWorkflow File "orthofinder/scripts/stride.py", line 505, in GetRoot File "multiprocessing/pool.py", line 250, in map File "multiprocessing/pool.py", line 554, in get scripts.newick.NewickError: Unexisting tree file or Malformed newick tree structure. Failed to execute script orthofinder " with that said, I must say that I'm not that great in python scripting so this question maybe dumb, how do I change the orthofinder version? I have 2.5.4 installed in my conda environment, perhaps I just need to change the line in the script that call for the orthofinder? In my conda env, the orthofinder is already in the path.

Right now, I'll try to make a parallel orthofinder analysis without using the synima pipeline and check if the same thing happens.

Thanks for your reply, I'll do my best to understand the matter and come back with answers

rhysf commented 6 months ago

Hi @pcampiteli The conda version you have is probably not being called, and instead the /util/support_scripts/orthofinder.Linux

The version included in Synima currently is 2.2.7, while the most up to date version is 2.5.5 and found here:

https://github.com/davidemms/OrthoFinder/releases/tag/2.5.5/ OrthoFinder.tar.gz

If you install that and get that working, and then copy it to the Synima folder under the same name - you'll be running the newer version, which could be worth trying. Otherwise, this looks like an issue you'd have to log in the Orthofinder issues found here: https://github.com/davidemms/OrthoFinder/issues

If you do find a reason from the authors of OrthoFinder, please let me know.