Estimating the number of chains to run with

underbais commented 5 years ago

Hello developers,

Based on some experiments running multievolve with different # of chains (4 and more) I found that the more complex the input is (multiple samples with more ssms/cnvs) the more chains are required to get to realistic trees (kind of obvious but still). So, I wonder if there is any way to estimate an optimal # of chains to use in a particular case based on input?

Thanks!

quaidmorris commented 5 years ago

Great question!

The short answer is, because you are (hopefully) running everything in parallel, hopefully, that you should run as many chains as you can afford. We run up to 40 for complex problems.

The long answer is that we haven't done that analysis but it is a very good suggestion and it is something we will consider doing. It might take us a while to do it but if you wanted to do that analysis yourself, the way that I would do it would be to track the best likelihood achieved by any of the chains, and see at what point you reach diminishing returns by running more chains.

Best, Quaid

On Thu, Mar 14, 2019 at 9:14 AM Chingiz Underbayev notifications@github.com wrote:

Hello developers,

Based on some experiments running multievolve with different # of chains (4 and more) I found that the more complex the input is (multiple samples with more ssms/cnvs) the more chains are required to get to realistic trees (kind of obvious but still). So, I wonder if there is any way to estimate an optimal # of chains to use in a particular case based on input?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/108, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGUdv0zETsxBP3JGU0VF7iv2WUHCn-2ks5vWksqgaJpZM4b0HLN .

underbais commented 5 years ago

Thank you Quaid!

May I ask in what ballpark those complex problems were (which took 40 chains) in terms of number of samples and variants? Just trying to get a feel on the range...

Thanks

quaidmorris commented 5 years ago

10-1000 mutations, 10-100 samples, up to 20 subclones.

It wasn't clear to use that all 40 were necessary. But we had access to 40 cores, so...

We hope to be putting together a manuscript soon (i.e. summer) on multievolve, so this will be a good figure to put in. We can actually go back and determine where we reach the point of diminishing returns.

My guess is that the number of samples and the projected number of subclones are the major factors determining complexity.

Best, Quaid.

On Thu, Mar 14, 2019 at 3:07 PM Chingiz Underbayev notifications@github.com wrote:

Thank you Quaid!

May I ask in what ballpark those complex problems were (which took 40 chains) in terms of number of samples and variants? Just trying to get a feel on the range...

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/108#issuecomment-473016971, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGUdmzaAnI_aZaRtPih4NWtisfupFY5ks5vWp4MgaJpZM4b0HLN .

underbais commented 5 years ago

Great! Ran with 40 chains, trees make sense now. Thanks!

quaidmorris commented 5 years ago

Great to hear! Thanks for letting us know!

On Sat, May 25, 2019 at 2:07 PM Chingiz Underbayev notifications@github.com wrote:

Great! Ran with 40 chains, trees make sense now. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/108?email_source=notifications&email_token=ABIZI5WSHNN6UHEYKETHKWDPXF57NA5CNFSM4G6QOLG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHWQFQ#issuecomment-495937558, or mute the thread https://github.com/notifications/unsubscribe-auth/ABIZI5VNM2DBHLYIHF2S7TLPXF57NANCNFSM4G6QOLGQ .

MUppal commented 5 years ago

Hi @quaidmorris , has there been any progress regarding the manuscript, or even summary recommendations regarding determining chain usage for number of mutations/samples/projected subclones?

morrislab / phylowgs

Estimating the number of chains to run with #108