karissawhiting / oncokbR

Annotate mutation, copy number alteration and structural variant data in R using oncoKB Annotation API
http://www.karissawhiting.com/oncokbR/
Other
5 stars 9 forks source link

Variant Classifications in MSK IMPACT Not Found In Consequence Map #28

Open david-nemirovsky opened 3 months ago

david-nemirovsky commented 3 months ago

There are a handful of variant classifications (variable mutationType) that are in the MSK IMPACT samples that are not recognized by the consequence map:

frameshift_deletion | 61 (13%) frameshift_insertion | 21 (4.4%) na | 5 (1.1%) nonframeshift_deletion | 18 (3.8%) nonframeshift_insertion | 1 (0.2%) nonsynonymous_snv | 313 (66%) stopgain_snv | 51 (11%) stoploss_snv | 1 (0.2%) upstream | 1 (0.2%)

karissawhiting commented 3 weeks ago

@david-nemirovsky, here is the package file where we create an internal dataframe to check and code consequence before the user's data gets sent to the annotator: https://github.com/karissawhiting/oncokbR/blob/main/data-raw/consequence_map.R.

Here is where they do it in the Python annotator: https://github.com/oncokb/oncokb-annotator/blob/4427d91f93c86d7024cd61bc78ca015fc59ce841/AnnotatorCore.py#L127

Let's check these agaist eachother, as well as check the ones you found above ^ and update our internal consequence_map.R file as needed.

Also, here is a draft of a function (based on part of the code currently in annotate_mutations) that is a helper to recode consequence:

# Takes a vector as input
.check_consequence <- function(variant_classification) {

  # * Check Variant Consequence  -----------

  variant_options <- tolower(unique(stats::na.omit(unlist(oncokbR::consequence_map))))
  variant_in_data <- tolower(unique(variant_classification))

#  not_allowed <- stats::na.omit(variant_in_data[!(variant_in_data %in% variant_options)])
  not_allowed <- stats::na.omit(setdiff(variant_in_data, variant_options))

  # Maybe turn into warning
  if(length(not_allowed) > 0) {
    cli::cli_abort("The following variant classification levels are not recognized: {.code {not_allowed}}.
                     Please remove or recode these to continue (see {.code oncokbR::consequence_map} for allowed values)")
  }
}

Can we please add this to utils (may want to double check this code works), and remove that part from annotate mutations?