morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Separate annotation of lymphoma drivers that aren't exactly hot spots #57

Closed rdmorin closed 2 years ago

rdmorin commented 2 years ago

There are several genes that are recurrently mutated but need only a subset of their mutations considered in some contexts because their function is firmly established. This includes NOTCH1 and NOTCH2 (PEST domain mutations). GAMBLR should be enhanced to handle these in addition to CREBBP KAT. Any other mutations used by LymphGen with a prescribed subset such as this should also be included. This applies to CD79B in theory but possibly not in practice based on my recollection of how LymphGen works.

Kdreval commented 2 years ago

I have added support for these genes to the review_hotspots function. The Source code of LymphGen found on Zenodo only has a specific way to treat NOTCH1, NOTCH2, and EZH2 mutations - but not CD79B. In contrast, the documentation says:

For NOTCH1 and NOTCH2 we only consider truncation mutations that affect the C-terminal PEST
domain (mutation base pair position less than 139391455 for NOTCH1, and mutation base pair
position less than 120459150 for NOTCH2). For CD79B, we considered only mutations that would
selectively alter their C-terminal ITAM regions. Specifically, we choose truncating mutations with
mutation base pair position less than 62007172 or non-truncating mutations with mutation base
pair position less than 62006800. EZH2 mutations are restricted to those that targeted the catalytic
domain (mutation base pair position between 148508764 and 148506238.

These are all hg19 coordinates, and I converted them to hg38 to support other genome build versions. This issue is addressed in the PR #59