stschiff / msmc2

GNU General Public License v3.0
50 stars 9 forks source link

Converting time boundaries into years #56

Open rancilhac opened 1 year ago

rancilhac commented 1 year ago

Hello,

I am using MSMC2 to infer split times between several populations, and when it comes to converting the time boundaries output by MSMC2 into years I see that there are two formulas available:

As far as I understand the difference is that with the second formula the time of a given segment is placed at its middle while with the first one it is placed at the left boundary. However, both result in fairly different estimates in my case and I'm wondering whether I should prefer one or the other.

Thanks in advance, Loïs

stschiff commented 1 year ago

Hmm, I'm not sure I understand the contexts of the two quotes above. I can't remember what's exactly in the book, but I guess the way to plot curves and estimate split times goes like this:

1) Determine the cross-coalescence rate as a continuous but step-wise function, as done by the tool combineCrossCoal.py from the MSMC-tools repo. 2) Look at this curve and check when it crosses the CCR=0.5 threshold.

That's it, right?

So basically, you should never convert a time-segment to a single time, but consider a single time-segment as an interval, which has a start and end point, and ultimately the coalescence rate functions as continuous, but step-wise functions through time.

rancilhac commented 1 year ago

Hi @stschiff, Sorry, my message was unclear and things were not very clear in my head either. Indeed the time segments are intervals, but I'm confused about how to convert the segments' boundaries in the output of combineCrossCoal.py into years. I think the first formula, time_boundary/mu*gen makes sense and correspond to what you describe. However in the script associated with the book chapter (https://github.com/StatisticalPopulationGenomics/MSMCandMSMC2/blob/master/plot_msmc.py) the middle of the time segments are used to calculate the time at which the CCR crosses the 0.5 threshold, and I don't understand why. Sorry if I'm missing something obvious!

stschiff commented 1 year ago

Those are minor differences. Just look at the scripts and judge whether the calculation makes sense to you. It's just linear interpolations. Not sure what to advice here.