stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
36 stars 14 forks source link

plot_fusion function not finding "SPFQ" or "CADM2" transcripts in an edb object that has them. #71

Closed ziadbakouny18 closed 4 years ago

ziadbakouny18 commented 4 years ago

Hey Stian,

Below is a description of the issue I am getting. I am using your suggested numbered list of items to include but starting with the code (4):

4) This is the code and data that are causing the error tcga_annot_soapfuse_format.txt

fusions<-import_soapfuse("https://github.com/stianlagstad/chimeraviz/files/4112483/tcga_annot_soapfuse_format.txt", "hg38")

edb <- EnsDb.Hsapiens.v86 plot_fusion( fusion = get_fusion_by_id(fusions, 15), edb = edb)

With "fusions" being a fusion object with ID=15 being: [1] "Fusion object" [1] "id: 15" [1] "Fusion tool: soapfuse" [1] "Genome version: hg38" [1] "Gene names: SFPQ-TFE3" [1] "Chromosomes: chr1-chrX" [1] "Strands: -,-" [1] "In-frame?: NA"

"fusions" has 36 other fusions and most of them run fine. The ones that are causing an error are those that have "SPFQ" or "CADM2" as one of the genes. I am getting the following error: Fetching transcripts for gene partners.. 'select()' returned 1:many mapping between keys and columns ..transcripts fetched. Fusion is interchromosomal. Plot separate! Fetching transcripts for gene partners.. ..transcripts fetched. Error in select_transcript(fusion@gene_upstream, which_transcripts) : genePartner has no transcripts. See get_transcripts_ensembl_db() In addition: There were 14 warnings (use warnings() to see them)

warnings() Warning messages: 1: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 2: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 3: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 4: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 5: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 6: In get_transcripts_ensembl_db(fusion, edb) : No transcripts available for the upstream gene SFPQ available. 7: In get_transcripts_ensembl_db(fusion, edb) : No transcripts available for the downstream gene TFE3 available. 8: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 9: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 10: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 11: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 12: In if (S4Vectors::mcols(gr)$gene_id[[1]] == fusion@gene_upstream@ensembl_id) { ... : the condition has length > 1 and only the first element will be used 13: In get_transcripts_ensembl_db(fusion, edb) : No transcripts available for the upstream gene SFPQ available. 14: In get_transcripts_ensembl_db(fusion, edb) : No transcripts available for the downstream gene TFE3 available.

I also tried filtering the edb object or inputting the transcripts but that doesn't work either. I also checked the edb object and it does have information for "SPFQ" and "CADM2" and I can't tell why the function is failing with only these 2 genes.

1) Session Info: R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] AnnotationHub_2.18.0 BiocFileCache_1.10.2 dbplyr_1.4.2 EnsDb.Hsapiens.v86_2.99.0 chimeraviz_1.12.0
[6] ensembldb_2.10.2 AnnotationFilter_1.10.0 GenomicFeatures_1.38.0 Gviz_1.30.0 biomaRt_2.42.0
[11] dendsort_0.3.3 metaseqR_1.26.0 qvalue_2.18.0 limma_3.42.0 DESeq_1.38.0
[16] locfit_1.5-9.1 EDASeq_2.20.0 ShortRead_1.44.1 GenomicAlignments_1.22.1 SummarizedExperiment_1.16.1 [21] DelayedArray_0.12.1 matrixStats_0.55.0 Rsamtools_2.2.1 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[26] Biostrings_2.54.0 XVector_0.26.0 BiocParallel_1.20.1 reshape2_1.4.3 Hmisc_4.3-0
[31] Formula_1.2-3 lattice_0.20-38 viridis_0.5.1 viridisLite_0.3.0 RColorBrewer_1.1-2
[36] pheatmap_1.0.12 psych_1.9.12 survminer_0.4.6 ggpubr_0.2.4 magrittr_1.5
[41] survival_3.1-8 table1_1.1 msigdbr_7.0.1 GSVA_1.34.0 GSEABase_1.48.0
[46] graph_1.64.0 annotate_1.64.0 XML_3.98-1.20 AnnotationDbi_1.48.0 IRanges_2.20.1
[51] S4Vectors_0.24.1 Biobase_2.46.0 BiocGenerics_0.32.0 broom_0.5.3 ggrepel_0.8.1
[56] gmodels_2.18.1 BH_1.72.0-2 data.table_1.12.8 forcats_0.4.0 stringr_1.4.0
[61] purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3 ggplot2_3.2.1
[66] tidyverse_1.3.0 dplyr_0.8.3

loaded via a namespace (and not attached): [1] rappdirs_0.3.1 rtracklayer_1.46.0 R.methodsS3_1.7.1 acepack_1.4.1
[5] bit64_0.9-7 knitr_1.26 aroma.light_3.16.0 R.utils_2.9.2
[9] rpart_4.1-15 hwriter_1.3.2 RCurl_1.95-4.12 generics_0.0.2
[13] org.Mm.eg.db_3.10.0 preprocessCore_1.48.0 RSQLite_2.1.5 bit_1.1-14
[17] BiocStyle_2.14.2 xml2_1.2.2 lubridate_1.7.4 httpuv_1.5.2
[21] assertthat_0.2.1 xfun_0.11 hms_0.5.2 evaluate_0.14
[25] promises_1.1.0 fansi_0.4.0 progress_1.2.2 caTools_1.17.1.3
[29] readxl_1.3.1 km.ci_0.5-2 DBI_1.1.0 geneplotter_1.64.0
[33] htmlwidgets_1.5.1 corrplot_0.84 backports_1.1.5 vctrs_0.2.1
[37] abind_1.4-5 log4r_0.3.1 withr_2.1.2 BSgenome_1.54.0
[41] checkmate_1.9.4 prettyunits_1.0.2 mnormt_1.5-5 cluster_2.1.0
[45] NBPSeq_0.3.0 lazyeval_0.2.2 crayon_1.3.4 genefilter_1.68.0
[49] edgeR_3.28.0 pkgconfig_2.0.3 nlme_3.1-143 ProtGenerics_1.18.0
[53] nnet_7.3-12 rlang_0.4.2 lifecycle_0.1.0 affyio_1.56.0
[57] modelr_0.1.5 dichromat_2.0-0 cellranger_1.1.0 Matrix_1.2-18
[61] KMsurv_0.1-5 zoo_1.8-6 reprex_0.3.0 base64enc_0.1-3
[65] png_0.1-7 rjson_0.2.20 bitops_1.0-6 NOISeq_2.30.0
[69] R.oo_1.23.0 KernSmooth_2.23-16 blob_1.2.0 brew_1.0-6
[73] jpeg_0.1-8.1 ggsignif_0.6.0 scales_1.1.0 memoise_1.1.0
[77] plyr_1.8.5 gplots_3.0.1.1 gdata_2.18.0 zlibbioc_1.32.0
[81] compiler_3.6.1 ArgumentCheck_0.10.2 cli_2.0.0 affy_1.64.0
[85] htmlTable_1.13.3 MASS_7.3-51.5 tidyselect_0.2.5 vsn_3.54.0
[89] stringi_1.4.3 yaml_2.2.0 askpass_1.1 latticeExtra_0.6-29
[93] survMisc_0.5.5 VariantAnnotation_1.32.0 tools_3.6.1 rstudioapi_0.10
[97] foreign_0.8-74 gridExtra_2.3 digest_0.6.23 BiocManager_1.30.10
[101] shiny_1.4.0 Rcpp_1.0.3 BiocVersion_3.10.1 later_1.0.0
[105] org.Hs.eg.db_3.10.0 httr_1.4.1 RCircos_1.2.1 biovizBase_1.34.1
[109] colorspace_1.4-1 rvest_0.3.5 fs_1.3.1 splines_3.6.1
[113] shinythemes_1.1.2 xtable_1.8-4 jsonlite_1.6 baySeq_2.20.0
[117] zeallot_0.1.0 R6_2.4.1 pillar_1.4.3 htmltools_0.4.0
[121] mime_0.8 glue_1.3.1 fastmap_1.0.1 DT_0.11
[125] interactiveDisplayBase_1.24.0 utf8_1.1.4 curl_4.3 gtools_3.8.1
[129] openssl_1.4.1 rmarkdown_2.0 munsell_0.5.0 GenomeInfoDbData_1.2.2
[133] haven_2.2.0 gtable_0.3.0

2) Fusion caller: I am using some custom fusions derived from a multi-caller that I have reformatted to a soapfuse format.

3) OS: Windows 10

stianlagstad commented 4 years ago

Hi @ziadbakouny18 ! Thank you for the report. This uncovered a big in the chimeraviz code, which caused only some ensembl gene identifiers to be used. It has been fixed (master will be fixed in chimeraviz version 1.13.1: https://github.com/stianlagstad/chimeraviz/commit/2d3985d28bebd9089cccbcd371f32709db155295, release 3.10 will be fixed in chimeraviz version 1.12.1: https://github.com/stianlagstad/chimeraviz/commit/0b677afe55a88f8377bc7adaaf1b5b86d79f0e02). When you see that the version on the page https://bioconductor.org/packages/release/bioc/html/chimeraviz.html has changed from 1.12.0 to 1.12.1 you should be able to get the new version. I'll leave this issue open until you've confirmed that it works on your end.

stianlagstad commented 4 years ago

Closing this as I'm confident the specific issue was fixed. Please comment again if you get the chance to test the fix.