Ppurple

Probabilistic purity ploidy estimation

Installation

Install dependent packages and latest Bioconductor (if you haven't already)

source("https://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")

Install devtools from CRAN (if you don't have it already)

install.packages('devtools')

Install mskilab R dependencies (gUtils and bamUtils)

devtools::install_github('mskilab/gUtils)
devtools::install_github('mskilab/bamUtils)

Install Ppurple

devtools::install_github('mskilab/Ppurple')

Tutorial

Ppurple uses EM to infer purity and ploidy from total coverage and heterozygote counts, provided as data.frames, data.tables, or GRanges. It returns a data.table of ranked solutions, associated with a posterior probability.

Load Ppurple

library(Ppurple)

Load coverage, hets, and segs

> cov = fread(system.file("extdata", "coverage.csv", package = "Ppurple"))
> head(cov)

seqnames	start	end	width	strand	y
1	79401	79600	200	*	1.5395624
1	531201	531400	200	*	1.9518525
1	555401	555600	200	*	0.8810031
1	571601	571800	200	*	0.8270474
1	573601	573800	200	*	1.0702993
1	617801	618000	200	*	1.3891481

> hets = fread(system.file("extdata", "hets.csv", package = "Ppurple"))
> head(hets)

seqnames	start	end	width	strand	ALT	REF	alt	ref
1	779322	779322	1	*	G	A	36	37
1	998395	998395	1	*	G	A	27	33
1	998501	998501	1	*	C	G	21	29
1	1158277	1158277	1	*	A	G	29	23
1	1160665	1160665	1	*	A	G	27	24
1	1206619	1206619	1	*	A	C	1	52

Run Ppurple without precomputed segs

> pp = ppurple(cov = cov, hets = hets, verbose = TRUE)

Segments not provided so doing internal segmentation via DNAcopy
sending  92654  segments to DNAcopy
... ...
Ppurple EM iteration 3 :
        LL diff:58.2742695834022 tol: 1
Ppurple EM iteration 4 :
        LL diff:0.912960716173984 tol: 1

output is a data.table of solutions and their probabilities, showing the most likely solution in the first row

> pp[1,]

purity	ploidy	prob
0.49	3.86	1

Run Ppurple with pre-computed segs

> segs = fread(system.file("extdata", "segs.csv", package = "Ppurple"))
> head(segs)

seqnames	start	end	width	strand	ID	num.mark	seg.mean
1	79401	6376801	6297401	*	Sample.1	183	0.0683
1	6437801	8702401	2264601	*	Sample.1	76	-0.0967
1	8758201	9043001	284801	*	Sample.1	9	0.1808
1	9080201	20046801	10966601	*	Sample.1	347	-0.1132
1	20113201	23387801	3274601	*	Sample.1	102	0.0931
1	23421201	48154401	24733201	*	Sample.1	795	-0.1152

> pp = ppurple(cov = cov, hets = hets, segs = segs, verbose = TRUE)

Fitting initial grid of 11 purity and 21 ploidy combinations.
Hapseg iteration 1:
        LL diff: 1e+100 tol:1
Hapseg iteration 2:
        LL diff: 5.17692387802526e-06 tol:1
Running ppemgrid with 11 purities ranging from 0 to 1 and 21 ploidies ranging from 1 to 5 with rho of 1.00978885796089 het rho of 22.283605.
... ...
Ppurple EM iteration 3 :
        LL diff:58.6840663431212 tol: 1
Ppurple EM iteration 4 :
        LL diff:0.864216329413466 tol: 1

>  pp[1,]

purity	ploidy	prob
0.49	3.86	1

Attributions

Marcin Imielinski - Assistant Professor, Weill Cornell Medicine; Core Member, New York Genome Center

Aditya Desphande - Tri-I CBM PhD candidate, Weill Cornell Medicine

mskilab-org / Ppurple

readme