Sequence quality effects

rwdavies / QUILT

QUILT: Low coverage whole genome sequence imputation with large reference panels

GNU General Public License v3.0

54 stars 11 forks source link

Thanks!

I've not evaluated this myself, and I think this could probably only truly be evaluated empirically, because of the number of factors at play. I've generally found the INFO score to be a very good predictor of imputation accuracy, so you could try running a few samples twice, one without filtering and one with, and see how the mean INFO scores compare, at various classes of SNPs (e.g. common or rare).

More generally I think it depends on whether the Phred scores are calibrated for these parts of these reads, and if the error is random. If errors are random and Phred scores are calibrated I would definitely expect more data to be better. As these conditions stop being met, particularly the randomness of the error, I think the extra data would be less useful, and things could potentially get worse.

Hope that helps!

rwdavies / QUILT

Sequence quality effects #39