waldronlab / TCGAutils

Toolbox package for organizing and working with TCGA data
https://bioconductor.org/packages/TCGAutils
22 stars 6 forks source link

Apparent Incompleteness of Subtypes #33

Open DarioS opened 1 year ago

DarioS commented 1 year ago

I find that subtype information can have a large amount of incompleteness. For example, in Genomic Classification of Cutaneous Melanoma, Cell, 2015

BRAF Subtype The largest genomic subtype is defined by the presence of BRAF hot-spot mutations (n = 166). RAS Subtype The second major subtype is defined by the presence of RAS hot-spot mutations (n = 95), including known amino acid changes with functional consequences, in all three RAS family members (N-, K- and H-RAS). NF1 Subtype The third most frequently observed SMG in the MAPK pathway was NF1, which was mutated in 14% (n = 28) of samples. Triple Wild-Type Subtype We defined the Triple-WT subtype (n = 46) as a heterogeneous subgroup characterized by a lack of hot-spot BRAF, N/H/K-RAS, or NF1 mutations.

but the the R package is

> table(colData(cutaneousMelanoma)[, "MUTATIONSUBTYPES"])
BRAF_Hotspot_Mutants      NF1_Any_Mutants  RAS_Hotspot_Mutants            Triple_WT 
                  32                    5                   11                    8

The R package has 343 patients and the journal article has 331, so it is unclear why there are so few assigned to a subtype.

LiNk-NY commented 1 year ago

Hi @DarioS Sorry I missed your issue. I am taking a look and looking for solutions. Best, Marcel