morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

get_gene_cn_and_expression does not return CN states #89

Closed Kdreval closed 2 years ago

Kdreval commented 2 years ago

The function get_gene_cn_and_expression does not return CN states. All values are NA, and the column name is incorrectly assigned as ._CN instead of <gene>_CN.

I think the issue is in this line here and respectively here when the ENSEMBL ID is specified instead of the gene symbol. It does not return a gene coordinates without error/warning. The object grch37_all_gene_coordinates referred to in these 2 lines should instead refer to grch37_gene_coordinates. Can confirm this modification returns the gene coordinates and CN states:

this_row = grch37_gene_coordinates %>%
  dplyr::filter(hugo_symbol == "KMT2D")
this_region = paste0(this_row$chromosome, ":", this_row$start, "-", this_row$end)
gene_name = "KMT2D"
gene_cn = get_cn_states(regions_list = c(this_region), region_names = c(gene_name)) %>%
  as.data.frame()
table(gene_cn)
gene_cn
   0    1    2    3    4    5    6    7   10   26   75 
   4   24 1133  181  102   11   15    2    1    1    1 

Compared to what is currently on master:

this_row = grch37_all_gene_coordinates %>%
  dplyr::filter(hugo_symbol == "KMT2D")
this_region = paste0(this_row$chromosome, ":", this_row$start, "-", this_row$end)
gene_name = "KMT2D"
gene_cn = get_cn_states(regions_list = c(this_region), region_names = c(gene_name)) %>%
  as.data.frame()
table(gene_cn)
gene_cn
   2 
1475 
mattssca commented 2 years ago

This was addressed in this PR