ssolo / ALE

Amalgamated likelihood estimation (ALE) is a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (cf. http://arxiv.org/abs/1211.4606 ), which allows for the duplication, transfer and loss of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees.
44 stars 15 forks source link

Inferred frequencies greater than 1 #38

Closed 473021677 closed 10 months ago

473021677 commented 1 year ago

Hi, I am using ALEml_undated in ALE 1.0 package to infer the evolutionary history. No errors have been reported by the program. When I check the result files, I find that few inferred frequencies of duplications, transfers, losses, and originations are greater than 1. I am not sure if there's something wrong with it. I will use the threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events. Could I count these duplications, transfers, losses, and originations events with inferred frequencies greater than 1? I really need your help. Thanks very much.

Best regards, YangYuan

ssolo commented 1 year ago

The numbers in question are estimates of the posterior mean number of events and as a result can be larger than unity.

E.g. if there is a 50% posterior probability of 0 Ds on a branch and 50% probability of 1 D one gets 0.5, but if there is 50% probability of 1 D and 50% of 2 Ds then one has 1.5.

I.e. the table at the end of the .uml_rec file summarises the number events per branch in the species tree from the reconciled gene trees above it (the long list of Newick strings). These reconciled gene trees are sampled according to their joint sequence and reconciliation likelihood (cf. eq. 3 here https://academic.oup.com/sysbio/article/62/6/901/1711882 ) in general they can have different topologies and reconciliations (i.e. series of DTL events).

In the example I attach (which I got by running ../build/bin/ALEml_undated Sab.tree Gab.tree.ale delta=0.2 tau=0.1 i.e. by forcing a higher probability of duplication for the purposes of this toy example ) in the list of reconciled gene trees you can see several alternative reconciled gene trees, e.g.

ones with 3 duplications on branch 3 of the species tree, these are the branches on the reconciled gene tree with @.***”:

@.**@*.**@*.***:1).4:0;

and also ones with only one duplication on branch 3 of the species tree @." again) and three transfers @.>3”, @.>3” and @.>c")

@.**@*.**@*.**@.>c:1).3:0;

etc.

the the estimate of the posterior mean number of D events on branch 3 of S is the average over these:

of Duplications Transfers Losses Originations copies singletons extinction_prob presence LL

.. S_internal_branch 3 1.75 0.43 0 0.47 3.05 0.3 9.31127e-07 0.93 -10.8095 ..

On 31 Oct 2022, at 03:51, 袁洋 @.***> wrote:

Dear Gergely Thanks for your explanations. I still can't understand what the 1 D and 2 Ds mean. I guess that 1 D and 2 Ds mean that a gene family can be duplicated once and twice on the branch of the species phylogeny, respectively. If it is true, how I should deal with these duplication events with inferred frequencies greater than 1? And If the inferred frequency is 1.9, I am not sure whether one or two duplication event should be counted.

Best regards, Yang Yuan

------------------ Original ------------------ From: "Gergely @.>; Date: Sun, Oct 30, 2022 05:31 PM To: @.>; Cc: @.***>; Subject: Re: Inferred frequencies greater than 1 (ALEml_undated in ALE 1.0 package)

Dear Yang Yuan,

The numbers in question are estimates of the posterior mean number of events and as a result can be larger than unity.

E.g. if ther is a 50% posterior probality of 0 Ds on a branch and 50% probability of 1 D one gets 0.5, but if there is 50% probality of 1 D and 50% of 2 Ds then one has 1.5.

Hope this answers the question.

Gergely

On 2022. Oct 29., at 16:55, 袁洋 @.***> wrote:

 Hi, I am using ALEml_undated in ALE 1.0 package to infer the evolutionary history. No errors have been reported by the program. When I check the result files, I find that few inferred frequencies of duplications, transfers, losses, and originations are greater than 1. I am not sure if there's something wrong with it. I will use the threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events. Could I count these duplications, transfers, losses, and originations events with inferred frequencies greater than 1? I really need your help. If you could help, I really appreciate. I have appended the result files. Thanks very much.

Best regards, YangYuan

On 29 Oct 2022, at 17:00, 473021677 @.***> wrote:

Hi, I am using ALEml_undated in ALE 1.0 package to infer the evolutionary history. No errors have been reported by the program. When I check the result files, I find that few inferred frequencies of duplications, transfers, losses, and originations are greater than 1. I am not sure if there's something wrong with it. I will use the threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events. Could I count these duplications, transfers, losses, and originations events with inferred frequencies greater than 1? I really need your help. Thanks very much.

Best regards, YangYuan

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

473021677 commented 1 year ago

Thanks for your explanation. I have got your idea. 

Best regards, Yang Yuan

                                                                                    ---原始邮件--- @.> 发送时间:2022年11月1日(星期二) 晚上6:27 @.>; 主题: [ssolo/ALE] Inferred frequencies greater than 1 (Issue #38)

 T                                                                             ------------------ Original ------------------ From: "Gergely J @.> Date: Tue, Nov 1, 2022 06:27 PM @.>; @.**@.>; Subject: Re: [ssolo/ALE] Inferred frequencies greater than 1 (Issue #38)

473021677 commented 1 year ago

Dear Gergely     I have another question. I have 11 gene tree files (OG0000000.muscle.trimal.phy_renamed.treefile, OG0000001.muscle.trimal.phy_renamed.treefile, ..., OG0000010.muscle.trimal.phy_renamed.treefile) and 1 rooted species tree (Species_rooted_tree_newick_renamed.txt), and placed them into the same folder. Then, I run the ALE program using the commands like "ALEobserve OG0000000.muscle.trimal.phy_renamed.treefile" and ALEml_undated Species_rooted_tree_newick_renamed.txt OG0000000.muscle.trimal.phyrenamed.treefile.ale sample=100  separators="". For the four gene tree files including OG0000000.muscle.trimal.phy_renamed.treefile, OG0000001.muscle.trimal.phy_renamed.treefile, OG0000002.muscle.trimal.phy_renamed.treefile and OG0000003.muscle.trimal.phy_renamed.treefile, the errors "ALEml_undated using ALE v1.0 Read species tree from: Species_rooted_tree_newick_renamed.txt.. Error, file test/OG0000001.muscle.trimal.phy_renamed.treefile.ale does not seem accessible." have been reported by the program. However, for the other 7 gene tree files, no errors have been reported. I can't resolve this problem and need your help. I have appended the files. Thanks.

Best regards, Yang Yuan       ------------------ Original ------------------ From: @.>; Date:  Tue, Nov 1, 2022 06:27 PM To: @.>; Cc: @.>; @.>; Subject:  Re: [ssolo/ALE] Inferred frequencies greater than 1 (Issue #38)

 

The numbers in question are estimates of the posterior mean number of events and as a result can be larger than unity.

E.g. if there is a 50% posterior probability of 0 Ds on a branch and 50% probability of 1 D one gets 0.5, but if there is 50% probability of 1 D and 50% of 2 Ds then one has 1.5.

I.e. the table at the end of the .uml_rec file summarises the number events per branch in the species tree from the reconciled gene trees above it (the long list of Newick strings). These reconciled gene trees are sampled according to their joint sequence and reconciliation likelihood (cf. eq. 3 here https://academic.oup.com/sysbio/article/62/6/901/1711882 ) in general they can have different topologies and reconciliations (i.e. series of DTL events).

In the example I attach (which I got by running ../build/bin/ALEml_undated Sab.tree Gab.tree.ale delta=0.2 tau=0.1 i.e. by forcing a higher probability of duplication for the purposes of this toy example ) in the list of reconciled gene trees you can see several alternative reconciled gene trees, e.g.

ones with 3 duplications on branch 3 of the species tree, these are the branches on the reconciled gene tree with @.***”:

@.**@*.**@*.***:1).4:0;

and also ones with only one duplication on branch 3 of the species tree @." again) and three transfers @.>3”, @.>3” and @.>c")

@.**@*.**@*.**@.>c:1).3:0;

etc.

the the estimate of the posterior mean number of D events on branch 3 of S is the average over these:

of Duplications Transfers Losses Originations copies singletons extinction_prob presence LL

.. S_internal_branch 3 1.75 0.43 0 0.47 3.05 0.3 9.31127e-07 0.93 -10.8095 ..

> On 31 Oct 2022, at 03:51, 袁洋 @.> wrote: > > Dear Gergely > Thanks for your explanations. I still can't understand what the 1 D and 2 Ds mean. I guess that 1 D and 2 Ds mean that a gene family can be duplicated once and twice on the branch of the species phylogeny, respectively. If it is true, how I should deal with these duplication events with inferred frequencies greater than 1? And If the inferred frequency is 1.9, I am not sure whether one or two duplication event should be counted. > > Best regards, > Yang Yuan > > > ------------------ Original ------------------ > From: "Gergely @.>; > Date: Sun, Oct 30, 2022 05:31 PM > To: @.>; > Cc: @.>; > Subject: Re: Inferred frequencies greater than 1 (ALEml_undated in ALE 1.0 package) > > > Dear Yang Yuan, > > The numbers in question are estimates of the posterior mean number of events and as a result can be larger than unity. > > E.g. if ther is a 50% posterior probality of 0 Ds on a branch and 50% probability of 1 D one gets 0.5, but if there is 50% probality of 1 D and 50% of 2 Ds then one has 1.5. > > Hope this answers the question. > > Gergely >> On 2022. Oct 29., at 16:55, 袁洋 @.***> wrote: >> >>  >> Hi, >> I am using ALEml_undated in ALE 1.0 package to infer the evolutionary history. No errors have been reported by the program. When I check the result files, I find that few inferred frequencies of duplications, transfers, losses, and originations are greater than 1. I am not sure if there's something wrong with it. I will use the threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events. Could I count these duplications, transfers, losses, and originations events with inferred frequencies greater than 1? I really need your help. If you could help, I really appreciate. I have appended the result files. Thanks very much. >> >> Best regards, >> YangYuan >>

> On 29 Oct 2022, at 17:00, 473021677 @.***> wrote: > > > Hi, > I am using ALEml_undated in ALE 1.0 package to infer the evolutionary history. No errors have been reported by the program. When I check the result files, I find that few inferred frequencies of duplications, transfers, losses, and originations are greater than 1. I am not sure if there's something wrong with it. I will use the threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events. Could I count these duplications, transfers, losses, and originations events with inferred frequencies greater than 1? I really need your help. Thanks very much. > > Best regards, > YangYuan > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you are subscribed to this thread. >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>