mskilab-org / case-report

Genomic case report
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Switch to OncoKB tiering for filtered events table #12

Open shihabdider opened 1 day ago

shihabdider commented 1 day ago

This feature involves switching our tiering for the filtered events table (currently based on the CGC dataset) to the more informative OncoKB API.

Responsibilities:

Example curated gene information:

OncoKB Curated Gene Information for ARID1B

kevinmhadi commented 5 hours ago

Hi all - I pushed a number of prelimimary changes to oncotable in Skilift branch "pact_skilift" (i kept there not to explode the number of branches right now)

kevinmhadi commented 5 hours ago

The relevant lines to parse and tier oncokb small mutations in are below in oncotable in branch "pact_skilift". This should replace the small mutations that make it into filtered_events table. These will be additional rows as I didn't remove the logic that parses annotated_bcf as input.

As for the additional and revised info from the oncokb columns that should make it into filtered_events:

tier columns will have levels 1-3, and tier_description corresponds to 1 = "Clinically Actionable" 2 = "Clinically Significant" 3 = "VUS"

Tier 1 variants will have at least one column from therapeutics, resistances, diagnoses, and prognoses that is not NA. Otherwise all other variants will be NA for these 4 columns. I think it's most important to add the tier_description info and the four fields to the filtered events table either as columns or tooltips.

We're still working on figuring out aligning the dosage (snv_multiplicity) with the oncokb output.

I think it makes sense to have tier + tier_description as separate columns (or concatenated as a string into 1 column, although this could affect sorting). And also dosage ( snv multiplicity) in the table for small mutations.

Then have the following info from the oncotable columns as tooltips that get exposed over the tier in the UI, but please let me know what you all think about the tooltip vs column idea for the below:

concat_out = oncokb[, .(
                        id = x, 
                        gene = Hugo_Symbol, 
                        variant.g = paste("g.",  Start_Position, "-", End_Position, sep = ""), 
                        variant.c = HGVSc,
                        variant.p = HGVSp,
                        annotation = Consequence,
                        type = snpeff_ontology,
                        tier = tier,
                        tier_description = tier_factor,
                        therapeutics = tx_string, # comes from parse_oncokb_tier
                        resistances = rx_string,
                        diagnoses = dx_string,
                        prognoses = px_string,
                        distance = NA_integer_,
                        track = "variants"
                )]
shihabdider commented 5 hours ago

Looks good. I'll start integrating this with the changes I've made to the oncotable method tomorrow.

I would advise against using a tooltip for containing event specific information (therapeutics, resistances, etc.). Instead, we can make the tier value a link that opens a modal with this information. Possibly we will want to iterate on this modal with additional tabs for integrating api hooks (e.g gene information, clinicaltrials, pubmed) and notes (including AI generated), etc.

For the tier description, I think we can just relegate it to help text that shows up when hovering on the tier column name (or ⓘ symbol next to the column name). @xanthopoulakis can use whatever appropriate symbol is provided by the antd library. Also VUS should probably be rendered as "Variant of Unknown Significance"; best not to have any abbreviations in help text if we can help it.

dosage should definitely go in its own column.

Also two things that would tremendously help me @kevinmhadi :

  1. Can you provide a test file and function call of the new methods you've added (you can create a new test file in the tests directory and add any new test data to inst/extdata/test_data). Preferably something small that can run in under a second, but a real file is fine too.
  2. Can you post a valid output in this thread (a head is fine, no need for full output), just so I know what I should expect the final output to look like.
kevinmhadi commented 4 hours ago

Yeah a modal would be good - maybe for now if these 4 fields can be incorporated into a modal that's "work in progress" - that'd be excellent for Thursday.

  1. I definitely can do that - I can write tests on this to close this out end of week, unless you need that for Thursday.

  2. Here's an expected (partial) output from oncotable (VIP case 397089) that you can copy and paste into R (the structure(...) statement). As in you can assign oncokb_df <- as.data.table(structure(...)) to get the data.table object to inspect. Side note, base::dput() just gives you an R language equivalent of the object (messy, but good for copying over small data structures rather than saving to RDS, and possibly useful for creating tests.

> base::dput(as.data.frame(concat_out[tier %in% 1:2]))
structure(list(id = c("397089", "397089", "397089", "397089"), 
    gene = c("ALK", "PRKDC", "KRAS", "TP53"), variant.g = c("g.29432664-29432664", 
    "g.48812928-48812928", "g.25398285-25398285", "g.7579420-7579420"
    ), variant.c = c("c.3824G>A", "c.3364+5G>A", "c.34G>T", "c.267delC"
    ), variant.p = c("p.Arg1275Gln", "", "p.Gly12Cys", "p.Ser90fs"
    ), annotation = c("missense_variant", "splice_region_variant,intron_variant", 
    "missense_variant", "frameshift_variant"), type = structure(c(8L, 
    NA, 8L, 1L), .Label = c("trunc", "cnadel", "cnadup", "complexsv", 
    "splice", "inframe_indel", "fusion", "missense", "promoter", 
    "mir", "regulatory", "noncoding", "inv", "synonymous", ""
    ), class = "factor"), tier = c(2L, 2L, 1L, 1L), tier_description = structure(c(2L, 
    2L, 1L, 1L), .Label = c("Clinically Actionable", "Clinically Significant", 
    "VUS"), class = "factor"), therapeutics = c(NA, NA, "Adagrasib+Cetuximab,Sotorasib,Adagrasib,Adagrasib+Panitumumab,Cobimetinib,Sotorasib+Cetuximab,Trametinib,Sotorasib+Panitumumab", 
    NA), resistances = c(NA, NA, "Cetuximab,Panitumumab,Tucatinib+Trastuzumab", 
    NA), diagnoses = c(NA_character_, NA_character_, NA_character_, 
    NA_character_), prognoses = c(NA, NA, NA, "AMLMRC,AML,TMN,CLLSLL,MDS,ET,MPN,PMF"
    ), distance = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
    ), track = c("variants", "variants", "variants", "variants"
    )), row.names = c(NA, -4L), class = "data.frame")