phyloacc / PhyloAcc

PhyloAcc a software to detect the changes of conservation of a genomic region
GNU General Public License v3.0
26 stars 12 forks source link

two trees do not have the same topology #65

Open chun-he-316 opened 2 months ago

chun-he-316 commented 2 months ago

Hello, I run the gt mode, and I met an error that two trees do not have the same topology, which means that phylogenetic tree from my mod file and the tree in coalescent unit from astral-species-tree.treefile is different. But I checked that they have the same topology. Please tell me how to solve this problem. Thank you!

gwct commented 2 months ago

Hi, Can you send me the two tree files and a sample alignment if possible? Thanks.

chun-he-316 commented 2 months ago

Hi,can you give me your email address?

---- Replied Message ---- | From | Gregg @.> | | Date | 04/24/2024 22:09 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [phyloacc/PhyloAcc] two trees do not have the same topology (Issue #65) |

Hi, Can you send me the two tree files and a sample alignment if possible? Thanks.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

gwct commented 2 months ago

gthomas [at] g [dot] harvard [dot] edu

gwct commented 1 month ago

Hello, The problem was that the tree output from ASTRAL removed the internal node labels from the original tree, so PhyloAcc couldn't match them up. I've added a step to the --theta pipeline that re-adds the node labels to the tree that ASTRAL outputs. This is included in the latest release (v2.3.2). Please update with conda update phyloacc (or mamba update phyloacc if you use mamba) and let me know if this resolves the error.

chun-he-316 commented 1 month ago

Hello, I update my phyloacc, but when I run rule run_phyloacc_gt with the shell comand "PhyloAcc-GT jishengxingjiedian.PhyloAcc/phyloacc-job-files/cfgs/1-gt.cfg &> jishengxingjiedian.PhyloAcc/phyloacc-job-files/phyloacc-output/1-phyloacc-gt-out/1-phyloacc.log",I met a problem "Segmentation fault (core dumped)". Please tell me what should I do. Thank you!

chun-he-316 commented 1 month ago

The jishengxingjiedian.PhyloAcc/phyloacc-job-files/phyloacc-output/1-phyloacc-gt-out/1-phyloacc.log file are as follows: Loading input data and running parameters...... Loading program configurations from /data/hechun/project/06_Hymeno/08_WGA/05_PhyloAcc/jishengxingjiedian.PhyloAcc/phyloacc-job-files/cfgs/1-gt.cfg......

total length = 6868 (50). # Species = 27. # elements = 50. Mean gene set size = 137.3600.

Burn-ins = 500. # MCMC Updates = 1000. # thin = 1. RND SEED = 1.

Threads = 40

Loading phylogenetic tree from ........(ignore) Loading phylogenetic tree in coalescent unit from ........(ignore) The species in profile and tree match perfectly. Reorder the species in profile matrix by the tree.

InitPhyloTree finished 50 elements to be computed element 14, number of base pair=75 element 46, number of base pair=57 start null model element 35, number of base pair=61 element 47, number of base pair=223 element 48, number of base pair=80 start null model element 3, number of base pair=45 element 1, number of base pair=47 element 5, number of base pair=31 element 2, number of base pair=31 element 21, number of base pair=25 element 29, number of base pair=32 element 45, number of base pair=35 element 40, number of base pair=39 element 49, number of base pair=29 element 30, number of base pair=223 element 31, number of base pair=27 element 22, number of base pair=49 element 17, number of base pair=121 start null model element 11, number of base pair=25 element 42, number of base pair=23 element 39, number of base pair=35 element 25, number of base pair=154 start null model element 34, number of base pair=29 element 36, number of base pair=29 element 33, number of base pair=59 element 23, number of base pair=351 start null model element 37, number of base pair=53 element 44, number of base pair=57 element 43, number of base pair=52 element 12, number of base pair=56 start null model element 32, number of base pair=56 start null model element 16, number of base pair=71 element 38, number of base pair=50 start null model element 13, number of base pair=50 start null model element 4, number of base pair=78 start null model element 27, number of base pair=52 start null model element 19, number of base pair=64 start null model element 41, number of base pair=77 start null model element 9, number of base pair=116 start null model element 50, number of base pair=134 start null model -0.8345 -0.8345 -1.5541 terminate called after throwing an instance of 'char const*'

gwct commented 1 month ago

I'm not sure this is related to the original tree issue, but I would really need @xyz111131 or @HanY-H to weigh in here. In my debugger, the lines of code where this is happening are in genetree.cpp:

    if(!findcss) {
        cout << heights_gene[gnode] <<endl;
        cout << bpp.heights[sp] <<endl;
        cout << bpp.heights[ss] <<endl;
        throw("findcss error");
    }

I will note that if I use the --filter option with phyloacc.py that filters out low quality alignments, I do not get this error, but rather another error in bpp_c.cpp: double free or corruption (out).

gaurav-agavekar commented 1 month ago

Hi @gwct I am facing a similar issue when trying to run the GT mode with ASTRAL species tree. I labeled ancestral nodes both in my phyloFit tree and ASTRAL species tree with tree_doctor using the -a flag. Both the tree topologies look identical to me when plotted and looking at the Newick string, but PhyloAcc errors out with "two trees do not have the same topology". Since you asked the user to share their trees via email, I have also sent an email to you. Can you please have a look? Also, I get segmentation fault when trying to run PhyloAcc with the --theta flag but I would anyway like to make it work with the ASTRAL tree I have.

Thanks!

gwct commented 1 month ago

Hello Gaurav, Looking at your trees one thing I notice is that some of the internal labels actually aren't the same. For instance, the branch labeled "HLAacrVol1-HLAstrAlb1" in the tree in the mod file is labeled "HLAserOpa1-HLAstrAlb1" in the ASTRAL tree. This will make PhyloAcc think the trees are not the same. I'm not sure how tree_doctor decides on the labels, but it seems that even if the topology is the same, its labeling things slightly differently. One thing you could do is use the --labeltree option in phyloacc.py. Right now, it only labels the tree in the mod file, so you'd have to do some copy/pasting to label both trees. But it should label them in the same way.

I'm still gathering information about the other errors. Our cluster has been down for maintenance this week, so I haven't been able to run anything.

gaurav-agavekar commented 1 month ago

Hi Gregg,

Thanks for spotting the inconsistencies, my bad. Thanks for the suggestion about trying --labeltree flag in PA. I tried it but it creates weird, duplicated node labels (I'm sending you the output trees by email). I wonder if there's a bug in the code. What I then did was manually edit labels (there were only a handful that were incorrectly labeled) in the tree_doctor output and visually made sure the topology and the labels are identical. This worked and now PhyloAcc recognizes the two tree topologies being identical and proceeds with the analysis. However, several jobs crash after InitPhyloTree finishes, without any error in the PhyloAcc output log file. When I check Slurm logs, I see "/usr/bin/bash: line 1: 671685 Segmentation fault (core dumped)". So it seems like with both the --theta flag and in GT mode with ASTRAL tree I'm unable to complete a PhyloAcc run. I'm sending you some log files by email as well. Can you please take a look? Thanks!

gwct commented 1 month ago

Hi Gaurav, There was indeed a bug with --labeltree, which I've hopefully fixed with #68. That should be up on bioconda in a day or so.

As for the Segementation fault, I'm guessing its similar to what the OP is seeing, and I'm still waiting to hear back from @xyz111131 about that.

gaurav-agavekar commented 1 month ago

Hi Gregg,

There was indeed a bug with --labeltree, which I've hopefully fixed with #68 https://github.com/phyloacc/PhyloAcc/pull/68. That should be up on bioconda in a day or so.

OK, thanks for the confirmation. As for the Segementation fault, I'm guessing its similar to what the OP is seeing, and I'm still waiting to hear back from @xyz111131 https://github.com/xyz111131 about that.

OK, I will wait to hear from them.

On May 29, 2024, at 11:58 PM, Gregg Thomas @.***> wrote:

Hi Gaurav, There was indeed a bug with --labeltree, which I've hopefully fixed with #68 https://github.com/phyloacc/PhyloAcc/pull/68. That should be up on bioconda in a day or so.

As for the Segementation fault, I'm guessing its similar to what the OP is seeing, and I'm still waiting to hear back from @xyz111131 https://github.com/xyz111131 about that.

— Reply to this email directly, view it on GitHub https://github.com/phyloacc/PhyloAcc/issues/65#issuecomment-2137627333, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATLBJMKALFC3YBHTA45B76DZEXUJPAVCNFSM6AAAAABGW46LGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZXGYZDOMZTGM. You are receiving this because you commented.