morrislab / phylowgs

Application for inferring subclonal composition and evolution from whole-genome sequencing data.
GNU General Public License v3.0
108 stars 55 forks source link

Current best way to choose the best tree #85

Open pankevich-ev opened 6 years ago

pankevich-ev commented 6 years ago

Hello, could you please give an update on what you consider the best way to choose the best tree at the moment?

I see that we could try taking the one with the lowest nlgLH (normalized log likelihood) or an average tree (https://github.com/morrislab/phylowgs/blob/gmm-tree-clustering/witness/README.md).

Or is it better to do that as in the paper Espiritu et al 2018, where "the JSON results were parsed to determine the best consensus tree, given by the largest log likelihood value" (https://www.sciencedirect.com/science/article/pii/S009286741830309X?via%3Dihub)? What is the "consensus tree" in this case? Where could we find the "log likelihood values" for all trees?

What would be your suggestion? Thanks a lot in advance and all the best, Eugenia

quaidmorris commented 6 years ago

Eugenia: great question! We are a few weeks away from a major release where we address this problem. Can you wait a couple of weeks?

On Mon, Jun 25, 2018 at 9:13 AM pankevich-ev notifications@github.com wrote:

Hello, could you please give an update on what you consider the best way to choose the best tree at the moment?

I see that we could try taking the one with the lowest nlgLH (normalized log likelihood) or an average tree ( https://github.com/morrislab/phylowgs/blob/gmm-tree-clustering/witness/README.md ).

Or is it better to do that as in the paper Espiritu et al 2018, where "the JSON results were parsed to determine the best consensus tree, given by the largest log likelihood value" ( https://www.sciencedirect.com/science/article/pii/S009286741830309X?via%3Dihub )? What is the "consensus tree" in this case? Where could we find the "log likelihood values" for all trees?

What would be your suggestion? Thanks a lot in advance and all the best, Eugenia

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/85, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGUdl7gM3IBEM73pgfJoq8C3QgMpei7ks5uAOH9gaJpZM4U2IG3 .

-- Quaid Morris, PhD Professor, The Donnelly Centre Departments of Molecular Genetics and Computer Science 160 College St, Rm 616 Toronto ON, M5S 3E1 Canada http://morrislab.ca http://morrislab.med.utoronto.ca cell: (416) 220 5796

pankevich-ev commented 6 years ago

Dear Quaid, thanks for your reply, we are looking forward to the release

pankevich-ev commented 6 years ago

Hello, has there been any update on the choice of the best tree that I might have missed? I have only seen some updates regarding the MCMC parameters, etc. Thanks a lot! Best, Eugenia

aleighbrown commented 6 years ago

I am also running into this question now, and eagerly awaiting the update. Hopefully this comes out soon!

quaidmorris commented 6 years ago

The team (i.e. Jarry and Jeff) have been working hard on this and we have a partially complete solution. Would that be useful? It clusters the trees, and chooses representatives of each cluster but it won't quantify the variability within the cluster -- you'll have to do that through the Witness interface if that's important to you. Also, you might need to merge some of the clusters together.

aleighbrown commented 6 years ago

That would be extremely useful yes! Thank you!

apachemonster commented 6 years ago

+1 yes please!

aleighbrown commented 6 years ago

Hi, wondering if there is any update on this issue?

elifirem commented 6 years ago

Hi, I am assuming that the update is not ready. I am currently trying to pick top k best trees. Would you still recommend that I pick the trees with highest llh that can be extracted from parsed JSON files? Or should I take the lowest nlgLH in witness? And also what is the relationship between these values (llh and nlgLH), if you don't mind explaining. Thanks a lot in advance, Irem

quaidmorris commented 6 years ago

Hi Irem,

llh stands for log likelihood and nlgLH stands for negative log likelihood.

Except we divided nlgLH by the number of mutations, but didn’t do that with llh which is why they look different. But the number of mutations is the same for each tree, so this should be true

llh = - N nlgLH

where N is the number of mutations.

So the trees with the highest nlgLH should be the trees with the lowest llh.

Q

On Thu, Nov 8, 2018 at 1:03 PM Irem Sarihan notifications@github.com wrote:

Hi, I am assuming that the update is not ready. I am currently trying to pick top k best trees. Would you still recommend that I pick the trees with highest llh that can be extracted from parsed JSON files? Or should I take the lowest nlgLH in witness? And also what is the relationship between these values (llh and nlgLH), if you don't mind explaining. Thanks a lot in advance, Irem

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/85#issuecomment-437098183, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGUdmwMLaQaIOek-3plumGf7W87EEKiks5utHH7gaJpZM4U2IG3 .

-- Quaid Morris, PhD Professor, The Donnelly Centre Departments of Molecular Genetics and Computer Science 160 College St, Rm 616 Toronto ON, M5S 3E1 Canada http://morrislab.ca http://morrislab.med.utoronto.ca cell: (416) 220 5796

ozill12 commented 6 years ago

Hi @quaidmorris @jwintersinger - Just wanted to check if the release mentioned above will include either a replacement to the previous "top_k_trees" file, which seems to have been removed as an output of evolve.py/multievolve.py during one of the repo updates earlier this year, or some guidance on how to parse the JSON results files to get useful tables for downstream analysis. I found the top_k_trees file useful because it included a table per tree with the mutation_identity - clone_number - CCF relationships that I could easily parse. Currently it seems that either one has to parse the various JSON files output from write_results.py (no guidance is given on the appropriate way to do that), or one has to use the witness interface, but it seems that the table displayed on the tree viewer page cannot be readily exported for downstream analysis. Thanks!

pawelqs commented 4 years ago

Hi @quaidmorris @jwintersinger, did something change now? I see that PhyloWGS Witness uses "density" to find the best tree. Shall we use it, or llh / nlgLH?