Open david-nemirovsky opened 3 months ago
@david-nemirovsky, here is the package file where we create an internal dataframe to check and code consequence before the user's data gets sent to the annotator: https://github.com/karissawhiting/oncokbR/blob/main/data-raw/consequence_map.R.
Here is where they do it in the Python annotator: https://github.com/oncokb/oncokb-annotator/blob/4427d91f93c86d7024cd61bc78ca015fc59ce841/AnnotatorCore.py#L127
Let's check these agaist eachother, as well as check the ones you found above ^ and update our internal consequence_map.R
file as needed.
Also, here is a draft of a function (based on part of the code currently in annotate_mutations) that is a helper to recode consequence:
# Takes a vector as input
.check_consequence <- function(variant_classification) {
# * Check Variant Consequence -----------
variant_options <- tolower(unique(stats::na.omit(unlist(oncokbR::consequence_map))))
variant_in_data <- tolower(unique(variant_classification))
# not_allowed <- stats::na.omit(variant_in_data[!(variant_in_data %in% variant_options)])
not_allowed <- stats::na.omit(setdiff(variant_in_data, variant_options))
# Maybe turn into warning
if(length(not_allowed) > 0) {
cli::cli_abort("The following variant classification levels are not recognized: {.code {not_allowed}}.
Please remove or recode these to continue (see {.code oncokbR::consequence_map} for allowed values)")
}
}
Can we please add this to utils (may want to double check this code works), and remove that part from annotate mutations?
There are a handful of variant classifications (variable
mutationType
) that are in the MSK IMPACT samples that are not recognized by the consequence map:frameshift_deletion | 61 (13%) frameshift_insertion | 21 (4.4%) na | 5 (1.1%) nonframeshift_deletion | 18 (3.8%) nonframeshift_insertion | 1 (0.2%) nonsynonymous_snv | 313 (66%) stopgain_snv | 51 (11%) stoploss_snv | 1 (0.2%) upstream | 1 (0.2%)