nclark-lab / RERconverge

Analysis of convergence between organismal traits and DNA/protein sequences
GNU General Public License v3.0
46 stars 26 forks source link

long branch length due to missing #65

Closed LipengKang closed 2 years ago

LipengKang commented 2 years ago

Dear Developers, I am estimate gene trees from alignments following "PhangornTreeBuildingWalkthrough". However, I find some extreme branch length because these species are totally missing in the alignment. Here is a example: Tree: ((((trmon:0.001153396671,trboe:0.001497551277):0.007412257911,((trdicA:1e-08,trdurA:0.0005279581624):1e-08,trura:10):0.2187154771); The species trura and trdicA are missing in the gene. However, it was assigned two opposite branch length, 10 vs 1e-08. I have two question:

  1. How does RERconverge deal with the extremely long branch length when calculating mean tree length and further RER ?
  2. These extreme branch length are assigned to extremely accelerate and conserve, but they are both missing. I don't get it, but it makes sense?

Sincerely, lipeng

LipengKang commented 2 years ago

@sorrywm Could you please help me?

nclark-lab commented 2 years ago

Hello, Input trees should not contain branches for species that do not have the sequence. I advise to estimate branch lengths only with the species that are present. It's fine if some trees are missing some species, because RERconverge was written in a way to handle such missing species. The phangorn wrapper scripts we provide should estimate trees in the correct way, meaning it drops species that are not in the alignment. I hope this helps. Let us know if other issues arise.

LipengKang commented 2 years ago

Sorry for the late reply because I did a complete test following RERconverge guides. No other problems arise. Thank you very much.

lipeng