morinlab / GAMBLR.results

A collection of functions to access results of the Genomic Analysis of Mature B-cell Lymphomas
MIT License
0 stars 0 forks source link

annotate_ssm_motif_context needs improvements #24

Closed Kdreval closed 7 months ago

Kdreval commented 9 months ago

This function currently directly translates the python implementation. When it was used, few areas for improvement were identified:

  1. The output currently is one of TRUE/FALSE/NO. We can add a new argument like return_logical which by default will be TRUE and the output column will be one of TRUE/FALSE (logical, not string). This way, there is still an option to return output that matches the original script but the default output will be more sensible and easier to interpret and use downstream.
  2. The documentation can be improved to indicate the difference between TRUE/FALSE/NO
  3. Because of the DNA repair context, we want to know if any mutation occurs in a WRCY context. We can check for WRCY first, and then assign TRUE/FALSE based on whether it’s the C mutated. This can also be done based on the new argument like prioritize_morif and replace the default behaviour.
lkhilton commented 9 months ago

I compared the output of the original CheckMotifMutBias.py script vs this GAMBLR function. The column Mutation_Overlap_WRCY is from the python script, WRCY is from the function:

> count(irf4_wrcy, Mutation_Overlap_WRCY, WRCY)
# A tibble: 7 × 3
  Mutation_Overlap_WRCY WRCY      n
  <chr>                 <chr> <int>
1 FALSE                 FALSE    60
2 FALSE                 NO       19
3 MOTIF                 FALSE    21
4 MOTIF                 NO       10
5 SITE                  FALSE     3
6 SITE                  NO        2
7 SITE                  TRUE     39

Clearly there are mutations that occur in the motif that are being assigned as FALSE in the GAMBLR function. I think we really want an output that's consistent with the python implementation. Ideally we'd identify any mutation overlapping the specified motif and also annotate when the expected site is mutated.

The script I'm using is here: /projects/rmorin/software/lab_scripts/CheckMotifMutBias/CheckMotifMutBias.py

The mini maf file I'm testing on is here: /projects/rmorin/projects/gambl-repos/gambl-lhilton/experiments/2023-11-22-IRF4/IRF4_ssm.maf

I tested in R with this line of code:

irf4_wrcy <- annotate_ssm_motif_context(
    maf = read_tsv("experiments/2023-11-22-IRF4/IRF4_ssm.wrcy.maf")
)