Ppurple
Probabilistic purity ploidy estimation
Installation
- Install dependent packages and latest Bioconductor (if you haven't already)
source("https://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")
- Install devtools from CRAN (if you don't have it already)
install.packages('devtools')
- Install mskilab R dependencies (gUtils and bamUtils)
devtools::install_github('mskilab/gUtils)
devtools::install_github('mskilab/bamUtils)
- Install Ppurple
devtools::install_github('mskilab/Ppurple')
Tutorial
Ppurple uses EM to infer purity and ploidy from total coverage and heterozygote
counts, provided as data.frames, data.tables, or GRanges. It returns a
data.table of ranked solutions, associated with a posterior probability.
Load Ppurple
library(Ppurple)
Load coverage, hets, and segs
> cov = fread(system.file("extdata", "coverage.csv", package = "Ppurple"))
> head(cov)
seqnames | start | end | width | strand | y |
1 | 79401 | 79600 | 200
| * | 1.5395624 |
1 | 531201 | 531400 | 200
| * | 1.9518525 |
1 | 555401 | 555600 | 200
| * | 0.8810031 |
1 | 571601 | 571800 | 200
| * | 0.8270474 |
1 | 573601 | 573800 | 200
| * | 1.0702993 |
1 | 617801 | 618000 | 200
| * | 1.3891481 |
> hets = fread(system.file("extdata", "hets.csv", package = "Ppurple"))
> head(hets)
seqnames | start | end | width | strand | ALT | REF | alt | ref |
1 | 779322 | 779322 | 1 | *
| G | A | 36 | 37 |
1 | 998395 | 998395 | 1 | *
| G | A | 27 | 33 |
1 | 998501 | 998501 | 1 | *
| C | G | 21 | 29 |
1 | 1158277 | 1158277 | 1 | *
| A | G | 29 | 23 |
1 | 1160665 | 1160665 | 1 | *
| A | G | 27 | 24 |
1 | 1206619 | 1206619 | 1 | *
| A | C | 1 | 52 |
Run Ppurple without precomputed segs
> pp = ppurple(cov = cov, hets = hets, verbose = TRUE)
Segments not provided so doing internal segmentation via DNAcopy
sending 92654 segments to DNAcopy
... ...
Ppurple EM iteration 3 :
LL diff:58.2742695834022 tol: 1
Ppurple EM iteration 4 :
LL diff:0.912960716173984 tol: 1
output is a data.table of solutions and their probabilities, showing the most likely solution in the first row
> pp[1,]
purity | ploidy | prob |
0.49 | 3.86 | 1 |
Run Ppurple with pre-computed segs
> segs = fread(system.file("extdata", "segs.csv", package = "Ppurple"))
> head(segs)
seqnames | start | end | width | strand | ID | num.mark | seg.mean |
1 | 79401 | 6376801 | 6297401 | * | Sample.1 | 183 | 0.0683 |
1 | 6437801 | 8702401 | 2264601 | * | Sample.1 | 76 | -0.0967 |
1 | 8758201 | 9043001 | 284801 | * | Sample.1 | 9 | 0.1808 |
1 | 9080201 | 20046801 | 10966601 | * | Sample.1 | 347 | -0.1132 |
1 | 20113201 | 23387801 | 3274601 | * | Sample.1 | 102 | 0.0931 |
1 | 23421201 | 48154401 | 24733201 | * | Sample.1 | 795 | -0.1152 |
> pp = ppurple(cov = cov, hets = hets, segs = segs, verbose = TRUE)
Fitting initial grid of 11 purity and 21 ploidy combinations.
Hapseg iteration 1:
LL diff: 1e+100 tol:1
Hapseg iteration 2:
LL diff: 5.17692387802526e-06 tol:1
Running ppemgrid with 11 purities ranging from 0 to 1 and 21 ploidies ranging from 1 to 5 with rho of 1.00978885796089 het rho of 22.283605.
... ...
Ppurple EM iteration 3 :
LL diff:58.6840663431212 tol: 1
Ppurple EM iteration 4 :
LL diff:0.864216329413466 tol: 1
> pp[1,]
purity | ploidy | prob |
0.49 | 3.86 | 1 |
Attributions
Marcin Imielinski - Assistant Professor, Weill Cornell Medicine; Core Member, New York Genome Center
Aditya Desphande - Tri-I CBM PhD candidate, Weill Cornell Medicine