morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
4 stars 2 forks source link

get_ssm_by_regions() is duplicating variants, and assigning them to the incorrect regions #106

Closed ckrushton closed 2 years ago

ckrushton commented 2 years ago

Running get_ssm_by_regions(grch37_ashm_regions) yields the following dataframe:

image

For some reason, this function is only returning variants upstream of MYC, and assigning them to (seemingly) ALL regions provided

image

ckrushton commented 2 years ago

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /gsc/software/linux-x86_64-centos7/R-4.1.3/lib64/R/lib/libRblas.so
LAPACK: /gsc/software/linux-x86_64-centos7/R-4.1.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GAMBLR_0.0.0.9500 forcats_0.5.1     stringr_1.4.0     dplyr_1.0.9       purrr_0.3.4       readr_2.1.2       tidyr_1.2.0       tibble_3.1.7     
 [9] ggplot2_3.3.6     tidyverse_1.3.1  

loaded via a namespace (and not attached):
  [1] readxl_1.4.0                backports_1.4.1             circlize_0.4.15             workflowr_1.7.0             RCircos_1.2.2              
  [6] plyr_1.8.7                  splines_4.1.3               BiocParallel_1.28.3         usethis_2.1.5               GenomeInfoDb_1.30.1        
 [11] digest_0.6.29               foreach_1.5.2               htmltools_0.5.2             fansi_1.0.3                 magrittr_2.0.3             
 [16] metaviz_0.3.1               memoise_2.0.1               cgdsr_1.3.0                 cluster_2.1.3               config_0.3.1               
 [21] doParallel_1.0.17           tzdb_0.3.0                  limma_3.50.3                remotes_2.4.2               ComplexHeatmap_2.10.0      
 [26] Biostrings_2.62.0           modelr_0.1.8                matrixStats_0.62.0          vroom_1.5.7                 prettyunits_1.1.1          
 [31] colorspace_2.0-3            blob_1.2.3                  rvest_1.0.2                 ggrepel_0.9.1               xfun_0.31                  
 [36] haven_2.5.0                 callr_3.7.0                 crayon_1.5.1                RCurl_1.98-1.7              jsonlite_1.8.0             
 [41] GEOquery_2.62.2             survival_3.3-1              iterators_1.0.14            glue_1.6.2                  SRAdb_1.56.0               
 [46] gtable_0.3.0                zlibbioc_1.40.0             XVector_0.34.0              GetoptLong_1.0.5            DelayedArray_0.20.0        
 [51] car_3.1-0                   pkgbuild_1.3.1              shape_1.4.6                 BiocGenerics_0.40.0         abind_1.4-5                
 [56] scales_1.2.0                DBI_1.1.3                   rstatix_0.7.0               ggthemes_4.2.4              Rcpp_1.0.8.3               
 [61] clue_0.3-61                 bit_4.0.4                   stats4_4.1.3                httr_1.4.3                  htmlwidgets_1.5.4          
 [66] RColorBrewer_1.1-3          ellipsis_0.3.2              pkgconfig_2.0.3             XML_3.99-0.10               R.methodsS3_1.8.2          
 [71] dbplyr_2.2.0                utf8_1.2.2                  RMariaDB_1.2.2              later_1.3.0                 tidyselect_1.1.2           
 [76] rlang_1.0.2                 reshape2_1.4.4              cellranger_1.1.0            munsell_0.5.0               tools_4.1.3                
 [81] cachem_1.0.6                cli_3.3.0                   generics_0.1.2              devtools_2.4.3              broom_0.8.0                
 [86] evaluate_0.15               fastmap_1.1.0               yaml_2.3.5                  knitr_1.39                  processx_3.6.1             
 [91] bit64_4.0.5                 fs_1.5.2                    R.oo_1.25.0                 xml2_1.3.3                  brio_1.1.3                 
 [96] compiler_4.1.3              rstudioapi_0.13             curl_4.3.2                  png_0.1-7                   testthat_3.1.4             
[101] maftools_2.10.05            ggsignif_0.6.3              reprex_2.0.1                stringi_1.7.6               ps_1.7.1                   
[106] desc_1.4.1                  lattice_0.20-45             Matrix_1.4-1                ggsci_2.9                   vctrs_0.4.1                
[111] pillar_1.7.0                lifecycle_1.0.1             g3viz_1.1.4                 GlobalOptions_0.1.2         data.table_1.14.2          
[116] cowplot_1.1.1               bitops_1.0-7                httpuv_1.6.5                rtracklayer_1.54.0          GenomicRanges_1.46.1       
[121] R6_2.5.1                    BiocIO_1.4.0                promises_1.2.0.1            IRanges_2.28.0              sessioninfo_1.2.2          
[126] codetools_0.2-18            assertthat_0.2.1            pkgload_1.2.4               SummarizedExperiment_1.24.0 rprojroot_2.0.3            
[131] rjson_0.2.21                withr_2.5.0                 GenomicAlignments_1.30.0    Rsamtools_2.10.0            S4Vectors_0.32.4           
[136] GenomeInfoDbData_1.2.7      parallel_4.1.3              hms_1.1.1                   grid_4.1.3                  rmarkdown_2.14             
[141] MatrixGenerics_1.6.0        carData_3.0-5               ggpubr_0.4.0                lubridate_1.8.0             Biobase_2.54.0             
[146] restfulr_0.0.15 ```
rdmorin commented 2 years ago

@mattssca already fixed this issue. Here's how I run it and it works on my branch. Waiting for that PR to be merged, unfortunately.

all_dlbcl_genome_ashm = get_ssm_by_regions(regions_bed=regions_bed,seq_type="genome",streamlined = F)
mattssca commented 2 years ago

The PR with a fix for this has now been merged into Master