matted-zz / multipool

High-resolution genetic mapping for pooled sequencing.
http://cgs.csail.mit.edu/multipool/
MIT License
9 stars 12 forks source link

[BUGFIX] Use offset bin positions consistently #8

Closed tw164 closed 7 years ago

tw164 commented 8 years ago

Handling of offset bin positions was not implemented consistently. This has resulted in offset bin positions being used in plotting output, but not in summary text output.

Change function load_table to return an array of bin edges, which is a numpy array of integer bin start positions, followed by the end position of the last bin (all in base pairs). Wherever bin starts only are used, these are explicitly set from the bins array to a variable bin_starts.

Change function doComputation to accept bin edge positions, and to index into these positions with the interval edge variables left and right. Bin edges are needed because the right index may be T in some cases where an interval extends beyond the rightmost bin. In such cases, the rightmost bin edge position is used. Ensure left indices of intervals are greater than or equal to zero, and right indices are less than or equal to T.

Change function doPlotting to set the horizontal plotting positions in X from the mid-points of the offset bins, to draw the 90% Bayesian credible interval between offset bin positions, and to plot the axis between the first and last bin edges.

tw164 commented 8 years ago

This change may introduce a potential problem in the calculation of the sublocalized best location (lines 418-421 in this version). Clamping indices to the interval [0,T) helps keep LOD array access within bounds, but can imbalance the range of bins used at chromosome ends, which has the effect of 'pulling' the best location away from the nearest terminal bin. Is there a way to rebalance the calculation of sublocalized best location? Or would it be better to output an 'NA' value in such cases?

matted-zz commented 7 years ago

Thanks! And sorry for the long delay in processing this. This all looks good, but I will have to think more about the point you raised in your comment. The (potential) edge effects have been something that have bothered me since the beginning of this project, and I still don't have a great answer.