Open shihabdider opened 1 day ago
Hi all - I pushed a number of prelimimary changes to oncotable in Skilift branch "pact_skilift" (i kept there not to explode the number of branches right now)
The relevant lines to parse and tier oncokb small mutations in are below in oncotable in branch "pact_skilift". This should replace the small mutations that make it into filtered_events
table. These will be additional rows as I didn't remove the logic that parses annotated_bcf
as input.
As for the additional and revised info from the oncokb columns that should make it into filtered_events
:
tier
columns will have levels 1-3, and tier_description
corresponds to
1 = "Clinically Actionable"
2 = "Clinically Significant"
3 = "VUS"
Tier 1 variants will have at least one column from therapeutics
, resistances
, diagnoses
, and prognoses
that is not NA. Otherwise all other variants will be NA for these 4 columns. I think it's most important to add the tier_description
info and the four fields to the filtered events table either as columns or tooltips.
We're still working on figuring out aligning the dosage (snv_multiplicity) with the oncokb output.
I think it makes sense to have tier
+ tier_description
as separate columns (or concatenated as a string into 1 column, although this could affect sorting). And also dosage
( snv multiplicity) in the table for small mutations.
Then have the following info from the oncotable columns as tooltips that get exposed over the tier in the UI, but please let me know what you all think about the tooltip vs column idea for the below:
concat_out = oncokb[, .(
id = x,
gene = Hugo_Symbol,
variant.g = paste("g.", Start_Position, "-", End_Position, sep = ""),
variant.c = HGVSc,
variant.p = HGVSp,
annotation = Consequence,
type = snpeff_ontology,
tier = tier,
tier_description = tier_factor,
therapeutics = tx_string, # comes from parse_oncokb_tier
resistances = rx_string,
diagnoses = dx_string,
prognoses = px_string,
distance = NA_integer_,
track = "variants"
)]
Looks good. I'll start integrating this with the changes I've made to the oncotable
method tomorrow.
I would advise against using a tooltip for containing event specific information (therapeutics, resistances, etc.). Instead, we can make the tier value a link that opens a modal with this information. Possibly we will want to iterate on this modal with additional tabs for integrating api hooks (e.g gene information, clinicaltrials, pubmed) and notes (including AI generated), etc.
For the tier description, I think we can just relegate it to help text that shows up when hovering on the tier column name (or ⓘ symbol next to the column name). @xanthopoulakis can use whatever appropriate symbol is provided by the antd library. Also VUS should probably be rendered as "Variant of Unknown Significance"; best not to have any abbreviations in help text if we can help it.
dosage
should definitely go in its own column.
Also two things that would tremendously help me @kevinmhadi :
tests
directory and add any new test data to inst/extdata/test_data
). Preferably something small that can run in under a second, but a real file is fine too.head
is fine, no need for full output), just so I know what I should expect the final output to look like. Yeah a modal would be good - maybe for now if these 4 fields can be incorporated into a modal that's "work in progress" - that'd be excellent for Thursday.
I definitely can do that - I can write tests on this to close this out end of week, unless you need that for Thursday.
Here's an expected (partial) output from oncotable (VIP case 397089) that you can copy and paste into R (the structure(...)
statement). As in you can assign oncokb_df <- as.data.table(structure(...))
to get the data.table
object to inspect. Side note, base::dput()
just gives you an R language equivalent of the object (messy, but good for copying over small data structures rather than saving to RDS, and possibly useful for creating tests.
> base::dput(as.data.frame(concat_out[tier %in% 1:2]))
structure(list(id = c("397089", "397089", "397089", "397089"),
gene = c("ALK", "PRKDC", "KRAS", "TP53"), variant.g = c("g.29432664-29432664",
"g.48812928-48812928", "g.25398285-25398285", "g.7579420-7579420"
), variant.c = c("c.3824G>A", "c.3364+5G>A", "c.34G>T", "c.267delC"
), variant.p = c("p.Arg1275Gln", "", "p.Gly12Cys", "p.Ser90fs"
), annotation = c("missense_variant", "splice_region_variant,intron_variant",
"missense_variant", "frameshift_variant"), type = structure(c(8L,
NA, 8L, 1L), .Label = c("trunc", "cnadel", "cnadup", "complexsv",
"splice", "inframe_indel", "fusion", "missense", "promoter",
"mir", "regulatory", "noncoding", "inv", "synonymous", ""
), class = "factor"), tier = c(2L, 2L, 1L, 1L), tier_description = structure(c(2L,
2L, 1L, 1L), .Label = c("Clinically Actionable", "Clinically Significant",
"VUS"), class = "factor"), therapeutics = c(NA, NA, "Adagrasib+Cetuximab,Sotorasib,Adagrasib,Adagrasib+Panitumumab,Cobimetinib,Sotorasib+Cetuximab,Trametinib,Sotorasib+Panitumumab",
NA), resistances = c(NA, NA, "Cetuximab,Panitumumab,Tucatinib+Trastuzumab",
NA), diagnoses = c(NA_character_, NA_character_, NA_character_,
NA_character_), prognoses = c(NA, NA, NA, "AMLMRC,AML,TMN,CLLSLL,MDS,ET,MPN,PMF"
), distance = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
), track = c("variants", "variants", "variants", "variants"
)), row.names = c(NA, -4L), class = "data.frame")
This feature involves switching our tiering for the filtered events table (currently based on the CGC dataset) to the more informative OncoKB API.
Responsibilities:
Column Set Definition:
@kevinmhadi and @asd1289 will describe the final column set for the filtered events table.
SNV/Indels Calls:
@kevinmhadi will work on SNV/indels calls from OncoKB.
CN Events Tiering:
@asd1289 will handle CN events tiering.
SNVplicity Integration:
@jrafailov will work on SNVplicity integration.
Integration into skilift Methods:
@shihabdider will integrate CN events and SNV/indels calls into
skilift::oncotable
method andskilift::filtered_events_json
generator.Frontend Integration:
@xanthopoulakis will incorporate new columns in the filtered events table into the frontend (case-reports).
API Hooks for Gene and Drug Lookup:
@shihabdider (with @xanthopoulakis) will integrate API hooks for gene and drug lookup.
@kevinmhadi and @asd1289 will indicate which websites/apis to use (tentatively GeneCards or NCBI for genes -- we can also just use OncoKB's curated gene list information, see below for an example; what should we use for drugs? ).
Note: NCBI is kind of slow when fetching; GeneCards doesn't have an API but we can do a batch query on all OncoKB genes to build our own database file or (my preference) just create hyperlinks that point to the GeneCard webpage. @xanthopoulakis will develop the modals for displaying this information.
Example curated gene information:
OncoKB Curated Gene Information for ARID1B