stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
36 stars 14 forks source link

How to input STAR-SEQR output? #92

Open DadongZ opened 3 years ago

DadongZ commented 3 years ago

We are using STAR-SEQR for fusion detection and I am trying to visualize the data using chimeraviz. Is there a good way to input the STAR-SEQR's output? Below is some information about the data and sessionInfo():

  samples  NAME
1 002-002 CCDC91--CD47
2 002-002 PGD--GK
3 002-002 DSTN--PCSK2
4 002-002 MANBAL--RRBP1
5 002-002 IGKV4-1--IGKJ4
6 002-002 KDM4C--HERC3
  NREAD_SPANS NREAD_JXNLEFT NREAD_JXNRIGHT           FUSION_CLASS
1           0             0              2          TRANSLOCATION
2           0             0              3          TRANSLOCATION
3           1             0              2           READ_THROUGH
4           0             2              0 INTERCHROM_INTERSTRAND
5           9             3              3           READ_THROUGH
6           0             0              2          TRANSLOCATION
             SPLICE_TYPE       BRKPT_LEFT      BRKPT_RIGHT LEFT_SYMBOL
1     CANONICAL_SPLICING chr12:28515447:+ chr3:107779698:-      CCDC91
2     CANONICAL_SPLICING  chr1:10460628:+  chrX:30742217:+         PGD
3     CANONICAL_SPLICING chr20:17550855:+ chr20:17389907:+        DSTN
4     CANONICAL_SPLICING chr20:35929815:+ chr20:17641172:-      MANBAL
5 NON-CANONICAL_SPLICING  chr2:89185666:+  chr2:89160432:-     IGKV4-1
6     CANONICAL_SPLICING   chr9:6893231:+  chr4:89607886:+       KDM4C
  RIGHT_SYMBOL                                                   ANNOT_FORMAT
1         CD47 Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
2           GK Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
3        PCSK2 Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
4        RRBP1 Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
5        IGKJ4 Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
6        HERC3 Symbol:Transcript:Strand:Exon_No:Dist_to_Exon:Frame:CDS_Length
                                                                                                                                                                                                                                                                                                                                                                                                                                                                   LEFT_ANNOT
1 CCDC91:ENST00000381259.5_1:+:6:0:0:291958,CCDC91:ENST00000539107.5_2:+:7:0:0:291958,CCDC91:ENST00000545336.5_1:+:10:0:0:291958,CCDC91:ENST00000545737.5_1:+:6:0:0:195425,CCDC91:ENST00000536442.5_1:+:7:0:0:195421,CCDC91:ENST00000543809.5_1:+:7:0:0:155580,CCDC91:ENST00000535520.5_1:+:9:0:-1:47469,CCDC91:ENST00000539904.1_1:+:6:0:-1:0,CCDC91:ENST00000540401.5_1:+:6:0:-1:0,CCDC91:ENST00000540794.5_1:+:NA:NA:NA:268730,CCDC91:ENST00000536154.5_1:+:NA:NA:NA:33932
2                                                                                                                                                                                                                                       PGD:ENST00000270776.13_2:+:3:0:0:20632,PGD:ENST00000460189.1_1:+:2:0:0:13620,PGD:ENST00000491493.5_3:+:3:0:0:11893,PGD:ENST00000465632.5_1:+:2:0:0:8481,PGD:ENST00000477958.5_1:+:3:0:0:1366,PGD:ENST00000483936.5_1:+:NA:NA:NA:18392
3                                                                                                                                                                                                                                                                                                                                                       DSTN:ENST00000246069.12_2:+:1:0:0:36938,DSTN:ENST00000449141.2_1:+:1:0:0:34921,DSTN:ENST00000474024.5_1:+:1:0:-1:6361
4                                                                                                                                                                                                                                                                 MANBAL:ENST00000373605.7_1:+:3:0:0:15152,MANBAL:ENST00000373606.7_1:+:2:0:0:15152,MANBAL:ENST00000397151.1_1:+:3:0:0:15152,MANBAL:ENST00000397152.7_1:+:4:0:0:15152,MANBAL:ENST00000397150.5_1:+:2:0:0:1037
5                                                                                                                                                                                                                                                                                                                                                                                                                                     IGKV4-1:ENST00000390243.2_2:+:2:2:1:582
6                                                                                                                                                                                    KDM4C:ENST00000381309.7_3:+:8:0:0:381741,KDM4C:ENST00000381306.7_3:+:8:0:0:377052,KDM4C:ENST00000536108.5_2:+:8:0:0:355503,KDM4C:ENST00000543771.5_2:+:8:0:0:283463,KDM4C:ENST00000438023.5_2:+:8:0:0:188860,KDM4C:ENST00000496464.1_1:+:1:0:-1:0,KDM4C:ENST00000489243.5_1:+:8:400:-1:0
                                                                                                                                                                                                                            RIGHT_ANNOT
1                                                                                                                 CD47:ENST00000355354.13_3:-:4:0:1:43625,CD47:ENST00000361309.5_2:-:4:0:1:43621,CD47:ENST00000644850.1_1:-:4:0:1:20799
2 GK:ENST00000378943.7_2:+:18:0:1:75205,GK:ENST00000378945.7_2:+:18:0:1:75205,GK:ENST00000378946.7_1:+:19:0:1:75205,GK:ENST00000427190.5_2:+:19:0:1:75205,GK:ENST00000481024.5_1:+:20:0:-1:20814,GK-AS1:ENST00000464659.1_1:-:1:74:-1:0
3                                                                       PCSK2:ENST00000262545.7_2:+:6:0:0:254765,PCSK2:ENST00000536609.1_1:+:5:0:0:254765,PCSK2:ENST00000377899.5_1:+:7:0:0:254708,PCSK2:ENST00000470007.1_1:+:6:0:-1:0
4                              RRBP1:ENST00000360807.8_3:-:2:0:0:46326,RRBP1:ENST00000377807.6_3:-:3:0:0:46326,RRBP1:ENST00000377813.5_3:-:3:0:0:46326,RRBP1:ENST00000398782.2_5:-:2:0:0:902,RRBP1:ENST00000455029.3_1:-:NA:NA:NA:28881
5                                                                                                                                                      IGKJ4:ENST00000390239.2_3:-:1:1:0:37,AC244205.1:ENST00000624935.3_2:-:NA:NA:NA:0
6                                                                                                           HERC3:ENST00000264345.7_1:+:20:0:2:101137,HERC3:ENST00000402738.6_3:+:22:0:2:101137,HERC3:ENST00000512194.1_1:+:6:0:2:17161
  DISTANCE
1       NA
2       NA
3   160950
4 18288643
5    25234
6       NA
                                                                                                                                                                                                        ASSEMBLED_CONTIGS
1                                                                                                                                                                     GGATCTATATTTAAGTGCTTATATTCATCCACAATAATGCTGAGGGCTTCG
2                                                                                                                 GGGCAAGCTGTGGATGATTTCATCGAGAAATTGAAAGTGAAATTCGTTATT,AAGTGGTATTCCATAAAACCTACCAACTCATGGATTCCCAAGATGTGAGCT
3                                                                                                                                           CCTGCGACCGCCGCGGCGAAGATGAATGCCGAAGCAAGTTACGACTTCAGCAGCAACGACCCCTATCCTTACCCTCG
4                                                                                                                                                                     GGACTCTTCCTGGGAGCCATCTTCCAGCTCATCTGTGTGCTGGCCATCATC
5 GGCCTCTCTGGGATAGAAGTTATTCAGCAGGCACACAACAGAGGCAGTTCCAGATTTCAACTGCTCATCAGATGGCGGGAAGATGAAGACAGATGGTGCAGCCACAGTTCGTTTGATCTCCACCTTGGTCCCTCCGCCGAAAGTGAGAGTATTATAATATTGCTGACAGTAATAAACTGCCACATCTTCAGCCTGCAGGCTGCTGATGGTGAGAG
6                                                                                                                  GATTGACTATGGAAAAGTTGCCAAATTGGAGTCTCCAAGAGCTTTTAGAT,ATCTGCCGAGAAAGCTATGGAGTGATTGAACAGAAGAAGCTGATACCTGGG
  ASSEMBLY_CROSS_JXN                                    PRIMERS
1               TRUE  AGGAAAGCTGGTCACGAAGC,CCTGGGACGAAAAGAATGGC
2               TRUE TGGGCAAGCTGTGGATGATT,AGCTCACATCTTGGGAATCCA
3               TRUE  GAGGACGGTCTGCATACTCG,TCGTTGCTGCTGAAGTCGTA
4              FALSE  ACTCTTCCTGGGAGCCATCT,AAAGACCACAACCCCCAAGG
5               TRUE  TCACTCTCACCATCAGCAGC,TGATCTCCACCTTGGTCCCT
6               TRUE CAAGATAACCCAGGAGGCTGG,AGTCTCCTCCACATCCTCCC
                                     ID SPAN_CROSSHOM_SCORE JXN_CROSSHOM_SCORE
1 chr12:28515449:+:chr3:107779700:-:4:0                   0                  0
2   chr1:10460630:+:chrX:30742217:+:1:0                   0                  0
3 chr20:17550857:+:chr20:17389907:+:1:0                   0                  0
4 chr20:35929817:+:chr20:17641174:-:1:4                   0                  0
5   chr2:89185668:+:chr2:89160434:-:1:0                   0                  0
6    chr9:6893233:+:chr4:89607886:+:1:1                   0                  0
  OVERHANG_DIVERSITY MINFRAG20 MINFRAG35 OVERHANG_MEANBQ SPAN_MEANBQ JXN_MEANBQ
1                  1         1         0        38.50000          NA   38.75000
2                  2         2         0        37.33333          NA   38.00000
3                  2         2         0        37.50000    38.50000   36.75000
4                  2         2         0        39.00000          NA   39.25000
5                  3         3         1        31.66667    37.22222   29.83333
6                  1         1         0        36.00000          NA   38.25000
  OVERHANG_BQ15 SPAN_BQ15 JXN_BQ15 OVERHANG_MM   SPAN_MM    JXN_MM
1             2         0        4           0        NA 0.0000000
2             3         0        6           0        NA 0.3333333
3             2         2        4           0 0.5000000 0.0000000
4             2         0        4           0        NA 0.0000000
5             6        18       12           1 0.4444444 0.0000000
6             2         0        4           0        NA 0.0000000
  OVERHANG_MEANLEN SPAN_MEANLEN JXN_MEANLEN TPM_FUSION   TPM_LEFT  TPM_RIGHT
1         34.00000           NA    34.00000   7.333199   6.400550  58.733329
2         32.66667           NA    33.83333   4.518474  14.731858  14.385095
3         23.50000     41.50000    39.25000   8.130358 248.322240  20.247429
4         33.00000           NA    34.00000 111.009156  31.226601   6.077102
5         32.33333     49.33333    30.33333  21.386369  13.404606 110.877316
6         28.00000           NA    36.75000  28.077132   8.035374  12.548261
                                 MAX_TRX_FUSION DISPOSITION
1  ENST00000381259.5_1--ENST00000644850.1_1|670        PASS
2 ENST00000270776.13_2--ENST00000378945.7_2|318        PASS
3 ENST00000246069.12_2--ENST00000470007.1_1|137        PASS
4  ENST00000397150.5_1--ENST00000398782.2_5|244        PASS
5  ENST00000390243.2_2--ENST00000390239.2_3|536        PASS
6  ENST00000543771.5_2--ENST00000512194.1_1|999        PASS
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

Matrix products: default
BLAS/LAPACK: /mounts/isilon/data/eahome/u1072932/anaconda3/envs/r-4.0.3/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] writexl_1.3.1   readxl_1.3.1    forcats_0.5.0   stringr_1.4.0
 [5] dplyr_1.0.7     purrr_0.3.4     readr_1.4.0     tidyr_1.1.2
 [9] tibble_3.1.2    ggplot2_3.3.5   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.6.1      compiler_4.0.3
 [5] dbplyr_2.1.1      tools_4.0.3       jsonlite_1.7.2    lubridate_1.7.9.2
 [9] lifecycle_1.0.0   gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.11
[13] reprex_0.3.0      cli_3.0.0         rstudioapi_0.13   DBI_1.1.1
[17] haven_2.3.1       withr_2.4.2       xml2_1.3.2        httr_1.4.2
[21] fs_1.5.0          generics_0.1.0    vctrs_0.3.8       hms_1.1.0
[25] grid_4.0.3        tidyselect_1.1.1  glue_1.4.2        R6_2.5.0
[29] fansi_0.5.0       modelr_0.1.8      magrittr_2.0.1    backports_1.2.1
[33] scales_1.1.1      ellipsis_0.3.2    rvest_0.3.6       assertthat_0.2.1
[37] colorspace_2.0-2  utf8_1.2.1        stringi_1.6.2     munsell_0.5.0
[41] broom_0.7.5       crayon_1.4.1
stianlagstad commented 2 years ago

Hi @DadongZ ! Thank you for posting. chimeraviz doesn't support https://github.com/ExpressionAnalysis/STAR-SEQR out of the box, and I'm not likely to have enough time to implement support for it in the near future. If you're somewhat familiar with R, it should be possible for you to implement your own import-function (see https://github.com/stianlagstad/chimeraviz/blob/master/R/import_starfusion.R for an example) that reads the STAR-SEQR output file. If you do create such a function, then please create a pull request to chimeraviz so that it can be added here :) I'll leave this issue open for now.