Closed Kdreval closed 7 months ago
I compared the output of the original CheckMotifMutBias.py
script vs this GAMBLR function. The column Mutation_Overlap_WRCY
is from the python script, WRCY
is from the function:
> count(irf4_wrcy, Mutation_Overlap_WRCY, WRCY)
# A tibble: 7 × 3
Mutation_Overlap_WRCY WRCY n
<chr> <chr> <int>
1 FALSE FALSE 60
2 FALSE NO 19
3 MOTIF FALSE 21
4 MOTIF NO 10
5 SITE FALSE 3
6 SITE NO 2
7 SITE TRUE 39
Clearly there are mutations that occur in the motif that are being assigned as FALSE in the GAMBLR function. I think we really want an output that's consistent with the python implementation. Ideally we'd identify any mutation overlapping the specified motif and also annotate when the expected site is mutated.
The script I'm using is here: /projects/rmorin/software/lab_scripts/CheckMotifMutBias/CheckMotifMutBias.py
The mini maf file I'm testing on is here: /projects/rmorin/projects/gambl-repos/gambl-lhilton/experiments/2023-11-22-IRF4/IRF4_ssm.maf
I tested in R with this line of code:
irf4_wrcy <- annotate_ssm_motif_context(
maf = read_tsv("experiments/2023-11-22-IRF4/IRF4_ssm.wrcy.maf")
)
This function currently directly translates the python implementation. When it was used, few areas for improvement were identified:
return_logical
which by default will be TRUE and the output column will be one of TRUE/FALSE (logical, not string). This way, there is still an option to return output that matches the original script but the default output will be more sensible and easier to interpret and use downstream.prioritize_morif
and replace the default behaviour.