stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
37 stars 14 forks source link

Empty reference when importing STAR-Fusion output #24

Closed komalsrathi closed 4 years ago

komalsrathi commented 6 years ago

Hi,

I am trying to import the results of STAR-Fusion (STAR version=STAR_2.5.3a) and facing some issues:

library(chimeraviz)

# Get reference to results file from deFuse
sf <- importStarfusion(filename = 'star-fusion.fusion_predictions.tsv', "hg19", 10)

# Load the results file into a list of fusion objects
fusion <- getFusionByGeneName(sf, geneName = 'KANSL1')
fusion <- getFusionById(fusion, id = '3')

# read fastq files with reads for that fusion
fastq1 <- 'SRR1559088_fusion_R1.fq'
fastq2 <- 'SRR1559088_fusion_R2.fq'

# extract the fusion junction sequence
referenceFilename <- "reference.fa"
writeFusionReference(fusion = fusion, filename = referenceFilename)

But the reference file is empty and there are no errors associated with it. Can you have a look at this when you get a chance?

Here is my sessionInfo:

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] chimeraviz_1.5.4       ensembldb_2.2.2        AnnotationFilter_1.2.0
 [4] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0   Biobase_2.38.0        
 [7] Gviz_1.22.3            GenomicRanges_1.30.3   GenomeInfoDb_1.14.0   
[10] Biostrings_2.46.0      XVector_0.18.0         IRanges_2.12.0        
[13] S4Vectors_0.16.0       BiocGenerics_0.24.0    BiocInstaller_1.28.0  

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.10.0           bitops_1.0-6                 
 [3] matrixStats_0.53.0            devtools_1.13.4              
 [5] bit64_0.9-7                   RColorBrewer_1.1-2           
 [7] progress_1.1.2                httr_1.3.1                   
 [9] rprojroot_1.3-2               tools_3.4.2                  
[11] backports_1.1.2               utf8_1.1.3                   
[13] DT_0.4.4                      R6_2.2.2                     
[15] rpart_4.1-12                  Hmisc_4.1-1                  
[17] DBI_0.7                       lazyeval_0.2.1               
[19] colorspace_1.3-2              nnet_7.3-12                  
[21] withr_2.1.1.9000              gridExtra_2.3                
[23] prettyunits_1.0.2             RMySQL_0.10.13               
[25] git2r_0.21.0                  bit_1.1-12                   
[27] curl_3.1                      compiler_3.4.2               
[29] cli_1.0.0                     htmlTable_1.11.2             
[31] DelayedArray_0.4.1            rtracklayer_1.38.3           
[33] scales_0.5.0.9000             checkmate_1.8.5              
[35] readr_1.1.1                   RCircos_1.2.0                
[37] stringr_1.2.0                 digest_0.6.15                
[39] Rsamtools_1.30.0              foreign_0.8-69               
[41] rmarkdown_1.8                 Rsubread_1.28.1              
[43] pkgconfig_2.0.1               base64enc_0.1-3              
[45] dichromat_2.0-0               htmltools_0.3.6              
[47] highr_0.6                     BSgenome_1.46.0              
[49] htmlwidgets_1.0               rlang_0.1.6.9003             
[51] rstudioapi_0.7                RSQLite_2.0                  
[53] shiny_1.0.5                   BiocParallel_1.12.0          
[55] acepack_1.4.1                 VariantAnnotation_1.24.5     
[57] RCurl_1.95-4.10               magrittr_1.5                 
[59] GenomeInfoDbData_1.0.0        Formula_1.2-2                
[61] Matrix_1.2-12                 Rcpp_0.12.15                 
[63] munsell_0.4.3                 stringi_1.1.6                
[65] yaml_2.1.16                   SummarizedExperiment_1.8.1   
[67] zlibbioc_1.24.0               org.Hs.eg.db_3.5.0           
[69] plyr_1.8.4                    AnnotationHub_2.10.1         
[71] blob_1.1.0                    crayon_1.3.4                 
[73] lattice_0.20-35               splines_3.4.2                
[75] hms_0.4.1                     knitr_1.19                   
[77] pillar_1.1.0                  biomaRt_2.34.2               
[79] XML_3.98-1.9                  evaluate_0.10.1              
[81] biovizBase_1.26.0             latticeExtra_0.6-28          
[83] data.table_1.10.4-3           httpuv_1.3.5                 
[85] org.Mm.eg.db_3.5.0            gtable_0.2.0                 
[87] assertthat_0.2.0              ggplot2_2.2.1.9000           
[89] mime_0.5                      xtable_1.8-2                 
[91] ArgumentCheck_0.10.2          survival_2.41-3              
[93] tibble_1.4.2                  GenomicAlignments_1.14.1     
[95] memoise_1.1.0                 cluster_2.0.6                
[97] interactiveDisplayBase_1.16.0 BiocStyle_2.6.1          

STAR output files (there are four .tsv files from STAR output, you can check all four), fastq files and reference output (empty) are attached here: https://drive.google.com/drive/u/0/folders/1NTk0PZRMlH1KFD3xHg8_UXu5m1lmGyKj

Thanks!

stianlagstad commented 6 years ago

Thank you again for a great bug report, @komalsrathi! Code + data = reproducible example, which makes my job a lot easier :) I will look into this today.

stianlagstad commented 6 years ago

This happens because the fusion junction sequence isn't imported from the STAR-Fusion file. When importing data from a defuse results file, you can see that the fusion junction sequence is present:

> defuse833ke <- system.file(
>   "extdata",
>   "defuse_833ke_results.filtered.tsv",
>   package = "chimeraviz")
> fusions_from_chimeraviz_example_data <- importDefuse(defuse833ke, "hg19", 1)
> fusion_from_chimeraviz_example_data <- getFusionById(fusions_from_chimeraviz_example_data, 5267)
> fusion_from_chimeraviz_example_data@geneA@junctionSequence
#   103-letter "DNAString" instance
# seq: TAAAAAGGATGCTTCGCGTTTTCTCTCTCCTTTTTGGAGACAGATTCGCAGTGGTCGCTTCTTCTCCTTGGATTTGTTAAGGATTCCAAGTAACTCTTATTTG

But when I import data from the star-fusion.fusion_predictions.tsv file you provided:

> sf <- importStarfusion(filename = 'star-fusion.fusion_predictions.tsv', "hg19", 10)
> fusion <- getFusionByGeneName(sf, geneName = 'KANSL1')
> fusion <- getFusionById(fusion, id = '3')
> fusion@geneA@junctionSequence
#   0-letter "DNAString" instance
# seq: 

This is simply because the fusion junction sequence isn't provided in the star-fusion.fusion_predictions.tsv. Luckily for us, Phil Lijnzaad has created a pull request that helps us in this regard: https://github.com/stianlagstad/chimeraviz/pull/23:

STAR-fusion >= 1.2.0, when run with --denovo_reconstruct has a column FUSION_CDS in output file star-fusion.fusion_predictions.abridged.annotated.coding_effect.tsv which contains the fusion sequence. This PR extracts it and adds it to the fusion object (in addition to setting attribute inframe ad also adding FFPM and annots to fusionToolSpecificData )

When that pull request is merged and a new version of chimeraviz has been released, this will work for you providing that you are using STAR-Fusion >= 1.2.0 and run it with the --denovo_reconstruct option.

chimeraviz should have given you an error message when calling writeFusionReference with a fusion object that doesn't have the fusion junction sequence. I'll make sure that is added.

stianlagstad commented 6 years ago

Version 1.4.3 of the release version of chimeraviz, and version 1.5.5 of the development version will now issue a warning when calling writeFusionReference with a Fusion object without the fusion junction sequence. :)

komalsrathi commented 6 years ago

Thanks Stian for looking into this so quickly - I will do a few tests and close this asap.

komalsrathi commented 6 years ago

Hi Stian,

Is it possible for you to have the function import the other TSV file star-fusion.fusion_predictions.abridged.annotated.coding_effect.tsv that I have in the drive instead of star-fusion.fusion_predictions.tsv? The star-fusion.fusion_predictions.tsv will never have a junction sequence or the type of fusion so it is really not very helpful.

The annotated file is the last file to be generated with the new star version and has more information of whether the fusion is INFRAME or not and also has the corresponding sequence (which it obtains from running TRINITY I suppose).

stianlagstad commented 6 years ago

I think pull request https://github.com/stianlagstad/chimeraviz/pull/23 will do what you want, so let's wait for that to be merged in. :)

komalsrathi commented 5 years ago

Hi Stian - any updates on this?

stianlagstad commented 5 years ago

Hey @komalsrathi , no updates from my end. Do you have an example output file from STAR-fusion where the --denovo_reconstruct parameter was used? Then I can create a new PR where I add it.

komalsrathi commented 5 years ago

Hi Stian - Here are the drive links for the output file from STAR-fusion:

https://drive.google.com/open?id=1XSLJV4EscC77P1z_Pt7f84IG2Dyj2wWv

I am actually interested in looking at the read first fusion: RP11-9L18.2--FLI1.

stianlagstad commented 5 years ago

@komalsrathi Please take a look at https://github.com/stianlagstad/chimeraviz/pull/56