stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
37 stars 14 forks source link

fusion@gene[AB]@ensemblId not filed in by importStarfusion #16

Closed plijnzaad closed 6 years ago

plijnzaad commented 6 years ago

Hi,

in the latest version of importStarfusion function does not fill in the ensemblId slot of the fusion partners. Would be nice to have. This is using output from STAR-fusion 1.2.0 (run CentOS 7, 3.10.0-693.11.6.el7.x86_64), analyzed on Mac OSX (Darwin PMC-GEN003 15.6.0 Darwin Kernel Version 15.6.0: Tue Jan 9 20:12:05 PST 2018; root:xnu-3248.73.5~1/RELEASE_X86_64 x86_64 i386 MacBookPro12).

The LeftGene and RightGene columns of the star-fusion.fusion_predictions.abridged.tsv file look like MT-ATP6^ENSG00000198899.2

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.dylib
LAPACK: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
 [1] grid      stats4    parallel  stats     datasets  graphics  grDevices
 [8] utils     methods   base     

other attached packages:
 [1] chimeraviz_1.4.1       ensembldb_2.2.0        AnnotationFilter_1.3.1
 [4] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0   Biobase_2.38.0        
 [7] Gviz_1.22.2            GenomicRanges_1.30.1   GenomeInfoDb_1.14.0   
[10] Biostrings_2.46.0      XVector_0.18.0         IRanges_2.12.0        
[13] S4Vectors_0.16.0       BiocGenerics_0.24.0    uuutils_1.48          
[16] gplots_3.0.1          

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.10.0           bitops_1.0-6                 
 [3] matrixStats_0.53.1            devtools_1.13.4              
 [5] bit64_0.9-7                   RColorBrewer_1.1-2           
 [7] progress_1.1.2                httr_1.3.1                   
 [9] rprojroot_1.3-2               tools_3.4.3                  
[11] backports_1.1.2               DT_0.4                       
[13] R6_2.2.2                      rpart_4.1-11                 
[15] KernSmooth_2.23-15            Hmisc_4.1-1                  
[17] DBI_0.7-15                    lazyeval_0.2.1               
[19] colorspace_1.3-2              nnet_7.3-12                  
[21] withr_2.1.1                   gridExtra_2.3                
[23] prettyunits_1.0.2             RMySQL_0.10.13               
[25] bit_1.1-12                    curl_3.1                     
[27] compiler_3.4.3                git2r_0.21.0                 
[29] htmlTable_1.11.2              DelayedArray_0.4.1           
[31] rtracklayer_1.38.3            caTools_1.17.1               
[33] scales_0.5.0                  checkmate_1.8.5              
[35] readr_1.1.1                   RCircos_1.2.0                
[37] stringr_1.2.0                 digest_0.6.15                
[39] Rsamtools_1.30.0              foreign_0.8-69               
[41] rmarkdown_1.8                 pkgconfig_2.0.1              
[43] base64enc_0.1-3               dichromat_2.0-0              
[45] htmltools_0.3.6               BSgenome_1.46.0              
[47] htmlwidgets_1.0               rlang_0.1.6                  
[49] rstudioapi_0.7                RSQLite_2.0                  
[51] BiocInstaller_1.28.0          shiny_1.0.5                  
[53] BiocParallel_1.12.0           gtools_3.5.0                 
[55] acepack_1.4.1                 VariantAnnotation_1.24.5     
[57] RCurl_1.95-4.10               magrittr_1.5                 
[59] GenomeInfoDbData_1.0.0        Formula_1.2-2                
[61] Matrix_1.2-12                 Rcpp_0.12.15                 
[63] munsell_0.4.3                 stringi_1.1.6                
[65] yaml_2.1.16                   SummarizedExperiment_1.8.1   
[67] zlibbioc_1.24.0               org.Hs.eg.db_3.5.0           
[69] plyr_1.8.4                    AnnotationHub_2.10.1         
[71] blob_1.1.0                    gdata_2.18.0                 
[73] lattice_0.20-35               splines_3.4.3                
[75] hms_0.4.1                     knitr_1.19                   
[77] pillar_1.1.0                  biomaRt_2.34.2               
[79] XML_3.98-1.9                  evaluate_0.10.1              
[81] biovizBase_1.26.0             latticeExtra_0.6-28          
[83] data.table_1.10.4-3           httpuv_1.3.5                 
[85] gtable_0.2.0                  assertthat_0.2.0             
[87] ggplot2_2.2.1                 mime_0.5                     
[89] xtable_1.8-2                  ArgumentCheck_0.10.2         
[91] survival_2.41-3               tibble_1.4.2                 
[93] GenomicAlignments_1.14.1      memoise_1.1.0                
[95] cluster_2.0.6                 interactiveDisplayBase_1.16.0
[97] BiocStyle_2.6.1              
plijnzaad commented 6 years ago

I quickly concocted a work around, maybe this is of use to anyone (too much in a hurry to do this a as a proper pull request, sorry :-)


.ensid <- function(gene){
    gsub(perl=TRUE, "\\.\\d+$","",
         unlist(lapply(strsplit(gene, "\\^"), function(p)p[2])))
}

addEnsemblIds <- function(file, fusions) {
    ## Specific to STAR-fusion output
    ## import misses the ens id's, add them here
    ## Usage: fusions <- addEnsemblIds(file,fusions)
    table <- read.table(file=file,
                          sep="\t", as.is=TRUE, quote="", header=TRUE,
                          comment.char="", row.names=NULL)
    if(nrow(table) != length(fusions))
      stop("Number of fusions found in ", file,
           " unequal to that in fusions argument")
    if (is.null(table$LeftGene) || is.null(table$RightGene))
      stop("Missing columns LeftGene and/or RightGene in ", file)
    ensA <- .ensid(table$LeftGene)
    ensB <- .ensid(table$RightGene)

    sapply(1:length(fusions), function(i) {
        f <- fusions[[i]]
        f@geneA@ensemblId <- ensA[i]
        f@geneB@ensemblId <- ensB[i]
        f
    })
}                                       #addEnsemblIds
stianlagstad commented 6 years ago

Thank you!:) I've pushed a fix for this which will be available in chimeraviz version 1.4.2 of the release version of Bioconductor, and chimeraviz version 1.5.4 of the devel version of Bioconductor.