lulizou / boostme

R package to impute methylation within WGBS using machine learning
MIT License
10 stars 3 forks source link
dna-methylation-prediction imputation machine-learning wgbs xgboost

BoostMe: DNA methylation prediction within whole-genome bisulfite sequencing

BoostMe is a machine learning method for imputing the continuous methylation values of CpGs sequenced at low coverage within whole-genome bisulfite sequencing data (WGBS). BoostMe relies on XGBoost, a previously developed gradient boosting machine learning algorithm, and the availability of multiple samples to achieve both higher accuracy and faster runtimes than previously reported methods.

Getting started

Installation (requires R >= 3.4):

devtools::install_github("lulizou/boostme")

BoostMe accepts data input WGBS data as a BSseq object, which you can learn more about here. Highest accuracy is achieved when multiple samples (at least 3) are used, but if you want, imputation can still be done using only neighboring CpG information by setting sampleAvg = FALSE.

Example

See vignette for an example using dummy data.

More information

To learn more about BoostMe, see the manuscript:

Zou, L.S., Erdos, M.R., Taylor, D.L., Chines, P.S., Varshney, A., The McDonnell Genome Institute, Parker, S.C.J., Collins, F.S., and Didion, J.P. BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues. bioRxiv 207506, 2017. 10.1101/207056