wangke16 / MSMC-IM

A tool for estimating time-dependent migration rates based on cross-/within-population coalescent rates from MSMC
16 stars 2 forks source link

Discrepancy in calculation of cumulative migration probability #13

Open ariloytynoja opened 6 months ago

ariloytynoja commented 6 months ago

Hello,

I'm confused about the correct way of computing M(t).

In MSMC_IM.py, they are computed using the right boundaries and then outputted in the file *.MSMC_IM.estimates.txt: https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L79

However, in plot_utils.py, they are computed using the left boundaries as only those are included in *.MSMC_IM.estimates.txt:

time_boundaries,IM_N1s,IM_N2s,m_t_s = read_from_MSMC_IM(args.Input)
if args.samegrid:
    t = time_boundaries
    CDF_t = cumulative_Symmigproportion(time_boundaries, m_t_s)

https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L86

As the first left boundary is zero, a consequence of the latter is that integ = 2 * m[0] * 0 is always zero and the first time period is ignored: https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L44

Confusingly, the plotting function in MSMC_IM.py is different from that in plot_utils.py, and erroneous at least if xlog is used and the first x-value (log(0)) is not defined:
https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L190

There are now three ways of plotting M(t):

I personally would plot M(t) with the column 'M' from *.MSMC_IM.estimates.txt and the right boundaries. One could add zeros in the vectors to start it from the corner. However, this differs from the plots generated by MSMC-IM.

Can you please tell your opinion on which one of these is the right way to do the plotting?

Best regards,

Ari

wangke16 commented 5 months ago

Hi,

Regarding time boundaries, I think you should check —samegrid function in plot_utils.py from MSMC-IM repository.

On 17 Apr 2024, at 11:21, Ari Löytynoja @.***> wrote:

Hello,

I'm confused about the correct way of computing M(t).

In MSMC_IM.py, they are computed using the right boundaries and then outputted in the file .MSMC_IM.estimates.txt: https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L79 https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L79 However, in plot_utils.py, they are computed using the left boundaries as only those are included in .MSMC_IM.estimates.txt:

time_boundaries,IM_N1s,IM_N2s,m_t_s = read_from_MSMC_IM(args.Input) if args.samegrid: t = time_boundaries CDF_t = cumulative_Symmigproportion(time_boundaries, m_t_s) https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L86 https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L86 As the first left boundary is zero, a consequence of the latter is that integ = 2 m[0] 0 is always zero and the first time period is ignored: https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L44 https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/plot_utils.py#L44 Confusingly, the plotting function in MSMC_IM.py is different from that in plot_utils.py, and erroneous at least if xlog is used and the first x-value (log(0)) is not defined: https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L190 https://github.com/wangke16/MSMC-IM/blob/e6c5d3b13d10131a6aec7b5a9fd792a3ce09bec9/MSMC_IM.py#L190 There are now three ways of plotting M(t):

in plot_utils.py for making plots from the .estimates.txt data but recomputing the M values in MSMC_IM.py for plots in .fittingdetails.xlog.pdf user-made plots using the M values in .estimates.txt I personally would plot M(t) with the column 'M' from .MSMC_IM.estimates.txt and the right boundaries. One could add zeros in the vectors to start it from the corner. However, this differs from the plots generated by MSMC-IM.

Can you please tell your opinion on which one of these is the right way to do the plotting?

Best regards,

Ari

— Reply to this email directly, view it on GitHub https://github.com/wangke16/MSMC-IM/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLYZY62PWKK2EFPPY3YKTDY5Y5IPAVCNFSM6AAAAABGK5KP26VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DOOBQG42TEMI. You are receiving this because you are subscribed to this thread.