uclahs-cds / project-method-AlgorithmEvaluation-BNCH-000082-SRCRNDSeed

GNU General Public License v2.0
1 stars 0 forks source link

Update sd and for p.adjust #110

Closed philippaSteinberg closed 11 months ago

philippaSteinberg commented 11 months ago

Description

Modified manuscript changes: add sd calculations, for adjustment.

Analysis Results

RandomSeed_stats.txt:

 # Numbers generated by supplementary_table_stats.R
# Statistics referenced in the results section

# Average number of subclones across patients and across SNV callers 
pyc.sr.ave  2.307143
pyc.mr.ave  1.942857
dpc.ave     3.728571
wgs.ave     1.95744

# Sd number of subclones across patients and across SNV callers
pyc.sr.sd   1.29196
pyc.sr.sd   1.092196
dpc.sd      1.894003    
wgs.sd      1.109137

# Average number of sub clones by pipeline
1           Mutect2-Battenberg-DPClust-sr 2.892857
2          Mutect2-Battenberg-PhyloWGS-sr 2.576271
3        Mutect2-Battenberg-PyClone-VI-mr 2.185714
4        Mutect2-Battenberg-PyClone-VI-sr 2.628571
5     SomaticSniper-Battenberg-DPClust-sr 4.821429
6    SomaticSniper-Battenberg-PhyloWGS-sr 1.306569
7  SomaticSniper-Battenberg-PyClone-VI-mr 1.571429
8  SomaticSniper-Battenberg-PyClone-VI-sr 1.642857
9          Strelka2-Battenberg-DPClust-sr 3.471429
10        Strelka2-Battenberg-PhyloWGS-sr 2.090909
11      Strelka2-Battenberg-PyClone-VI-mr 2.071429
12      Strelka2-Battenberg-PyClone-VI-sr 2.650000

# Average IQR per SRC
PyClone-VI-sr       0.2797619
PyClone-VI-mr       0.1666667
DPClust-sr      0.3214286
PhyloWGS-sr     0.3095238

# Average IQR and average sd by patient and per pipeline
                                pipeline  mean_IQR   mean_sd
 1:       Mutect2-Battenberg-PyClone-VI-sr 0.3750000 0.4869901
 2:      Strelka2-Battenberg-PyClone-VI-sr 0.3214286 0.4097482
 3: SomaticSniper-Battenberg-PyClone-VI-sr 0.1428571 0.1159737
 4:       Mutect2-Battenberg-PyClone-VI-mr 0.1071429 0.2419277
 5:      Strelka2-Battenberg-PyClone-VI-mr 0.3928571 0.2892069
 6: SomaticSniper-Battenberg-PyClone-VI-mr 0.0000000 0.1204677
 7:          Mutect2-Battenberg-DPClust-sr 0.1428571 0.1949994
 8:         Strelka2-Battenberg-DPClust-sr 0.4285714 0.4028866
 9:    SomaticSniper-Battenberg-DPClust-sr 0.3928571 0.3485116
10:         Mutect2-Battenberg-PhyloWGS-sr 0.3214286 0.3611734
11:        Strelka2-Battenberg-PhyloWGS-sr 0.2678571 0.3358238
12:   SomaticSniper-Battenberg-PhyloWGS-sr 0.3392857 0.3076496

## Average IQR below and above high median subclone count (85th quantile)
# PyClone-VI-sr: 3.85
# PyClone-VI-mr: 3
# DPClust-sr: 5
# PhyloWGS-sr: 3
pyc.sr.sub.iqr  0.1642857
pyc.sr.abo.iqr  0.8571429
pyc.mr.sub.iqr  0
pyc.mr.abo.iqr  0.5
dpc.sub.iqr 0.2115385
dpc.abo.iqr 0.5
wgs.sub.iqr 0.2727273
wgs.abo.iqr 0.4444444

## PhyloWGS failure rates
# PhyloWGS patient failure rates across all pipelines
P01 0.00000000 
P02 0.00000000 
P03 0.06666667 
P04 0.36666667 
P05 0.00000000 
P06 0.03333333 
P07 0.06666667 
P08 0.03333333 
P09 0.50000000 
P10 0.03333333 
P11 0.00000000 
P12 0.03333333 
P13 0.03333333 
P14 0.30000000

# PhyloWGS seed failure rates across all pipelines and binom.test p-value
     seed  n n.fail  fail.rate binom.p.value fdr.p.adjust
 1:  13142 35      7 0.16666667      0.146309  0.4876967
 2:  50135 33      9 0.21428571      0.027988  0.1399400
 3:  51404 38      4 0.09523810      0.655926  1.0000000
 4:  97782 39      3 0.07142857      0.831559  1.0000000
 5: 253505 38      4 0.09523810      0.655926  1.0000000
 6: 366306 42      0 0.00000000             1  1.0000000
 7: 423647 42      0 0.00000000             1  1.0000000
 8: 628019 32     10 0.23809524      0.010201  0.1020100
 9: 659767 36      6 0.14285714        0.2759  0.6897500
10: 838004 41      1 0.02380952      0.990525  1.0000000

# Patient seed pairs which only succeed for 1 pipeline
13142, P04
13142, P09
13142, P14
50135, P04
50135, P09
51404, P09
97782, P09
253505, P09
628019, P09
628019, P14
659767, P04

# Patient seed pairs which only succeed for 2 pipelines
13142, P12
50135, P03
50135, P07
51404, P04
51404, P13
97782, P04
253505, P04
253505, P14
628019, P06
628019, P07
628019, P08
628019, P10
659767, P14
838004, P03

## Figure 3 evaluation: Quantifying if seed gets mode
# Stats of how many patient + seed combinations got the mode subclone count of the patient
pyclone.vi                  261/420 62.1%
pyclone.vi.sr                   276/420 65.7%
pyclone.vi.mr                   157/210 74.8%
mutect2_battenberg_pyclone_vi_sr        107/140 76.4%
mutect2_battenberg_pyclone_vi_mr        62/70   88.6%
strelka2_battenberg_pyclone_vi_sr       111/140 79.3%
strelka2_battenberg_pyclone_vi_mr       58/70   82.9%
somaticsniper_battenberg_pyclone_vi_sr      128/140 91.4%
somaticsniper_battenberg_pyclone_vi_mr      66/70   94.3%

dpclust                     230/420 54.8%
mutect2_battenberg_dpclust_sr           125/140 89.3%
strelka2_battenberg_dpclust_sr          106/140 75.7%
somaticsniper_battenberg_dpclust_sr     111/140 79.3%

phylowgs                    198/376 52.7%
mutect2_battenberg_phylowgs_sr          90/118  76.3%
strelka2_battenberg_phylowgs_sr         96/121  79.3%
somaticsniper_battenberg_phylowgs_sr        113/137 82.5%

# Pipelines with seeds that are consistent across all samples
mutect2.battenberg.pyclone.vi.sr        838004
mutect2.battenberg.pyclone.vi.mr        13142, 97782, 366306, 659767
strelka2.battenberg.pyclone.vi.mr       13142, 423647
somaticsniper.battenberg.pyclone.vi.sr      51404, 366306, 423647
somaticsniper.battenberg.pyclone.vi.mr      51404, 253505, 366306, 423647, 628019, 838004
mutect2.battenberg.dpclust.sr           50135
somaticsniper.battenberg.dpclust.sr     50135

## Average rate that seed gets mode number of subclones across seeds by pipeline
                                 pipeline seed_getmode_ratio(%)
 1:       Mutect2-Battenberg-PyClone-VI-sr                 76.44
 2:      Strelka2-Battenberg-PyClone-VI-sr                 79.28
 3: SomaticSniper-Battenberg-PyClone-VI-sr                 91.44
 4:       Mutect2-Battenberg-PyClone-VI-mr                 88.56
 5:      Strelka2-Battenberg-PyClone-VI-mr                 82.84
 6: SomaticSniper-Battenberg-PyClone-VI-mr                 94.28
 7:          Mutect2-Battenberg-DPClust-sr                 89.30
 8:         Strelka2-Battenberg-DPClust-sr                 75.73
 9:    SomaticSniper-Battenberg-DPClust-sr                 79.29
10:         Mutect2-Battenberg-PhyloWGS-sr                 64.30
11:        Strelka2-Battenberg-PhyloWGS-sr                 68.56
12:   SomaticSniper-Battenberg-PhyloWGS-sr                 80.73

## Average rate that seed gets mode across patients and across sSNV
# PyClone-VI-sr (mean ratio(%): 82.387)
     seed ratio(%)
1   13142 78.56667
2   50135 78.56667
3   51404 88.10000
4   97782 85.73333
5  253505 69.03333
6  366306 88.10000
7  423647 80.96667
8  628019 90.50000
9  659767 76.20000
10 838004 88.10000

# PyClone-VI-mr (mean ratio(%): 88.56)
     seed ratio(%)
1   13142 95.23333
2   50135 85.70000
3   51404 80.93333
4   97782 85.70000
5  253505 90.46667
6  366306 90.46667
7  423647 95.23333
8  628019 90.46667
9  659767 85.70000
10 838004 85.70000

# DPClust (mean ratio(%): 81.44)
     seed ratio(%)
1   13142 78.56667
2   50135 92.86667
3   51404 73.83333
4   97782 92.90000
5  253505 73.83333
6  366306 80.96667
7  423647 78.56667
8  628019 78.56667
9  659767 85.73333
10 838004 78.56667

#PhyloWGS (mean ratio(%): 71.2)
     seed ratio(%)
1   13142 61.93333
2   50135 45.26667
3   51404 78.56667
4   97782 73.80000
5  253505 69.03333
6  366306 85.73333
7  423647 83.36667
8  628019 69.06667
9  659767 66.63333
10 838004 78.56667

# Across all 12 pipelines
     seed ratio(%)
1   13142 78.56667
2   50135 78.56667
3   51404 88.10000
4   97782 85.73333
5  253505 69.03333
6  366306 88.10000
7  423647 80.96667
8  628019 90.50000
9  659767 76.20000
10 838004 88.10000

# Ranking of seed which consistently calls the mode for a pipeline
366306  (3x)
423647  (3x)
838004  (2x)
13142   (2x)
51404   (2x)
50135   (2x)
253505  (1x)
628019  (1x)
97782   (1x)
659767  (1x)

# Ranking of pipelines with the most seeds which consistently call the mode across all patients
somaticsniper.battenberg.pyclone.vi.mr  (6x)
mutect2.battenberg.pyclone.vi.mr        (4x)
somaticsniper.battenberg.pyclone.vi.sr  (3x)
strelka2.battenberg.pyclone.vi.mr       (2x)
mutect2.battenberg.pyclone.vi.sr        (1x)
mutect2.battenberg.dpclust.sr           (1x)
somaticsniper.battenberg.dpclust.sr     (1x)

Checklist

[^1]: UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records [^2]: The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. [^3]: Genetic information is considered PHI. Forensic assays can identify patients with as few as 21 SNPs [^4]: RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

  To automatically exclude such files using a .gitignore file, see here for example.