odelaneau / shapeit4

Segmented HAPlotype Estimation and Imputation Tool
MIT License
90 stars 18 forks source link

Buffer read overflow in pbwt_solver::sweep #47

Open cnettel opened 3 years ago

cnettel commented 3 years ago

When troubleshooting a problem that might be similar to issue #45 (stack trace indicating heap corruption in a free operation of a string towards the end of the first burn-in round), I rebuilt the code using -fsanitize=address.

This picks up a read buffer overflow in pbwt_solver::sweep at the line:

if (hidx1<(n_total_hap-1)) s -= Guess[pbwt_clusters[idx_prev][hidx1+1]] * scoreBit[l - pbwt_divergences[idx_prev][hidx1+1] + 1];

It's not immediately clear to me that pbwt_divergence values are always supposed to be positive for the case where l == n_site - 1, and then this will indeed trigger an overflow. Obviously, that's also what happens in practice with my data.

While I suppose that the effect of a strange value there will be the phase of the guess, the invalid read by itself could theoretically end up at a page boundary triggering a segmentation fault in its own right. I would have made a pull request if it was clear to me what the proper fix is. It seems like a possible off by one error, but I am not sure in what way.