stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
37 stars 14 forks source link

seqlevels(param) not in BAM header #5

Closed komalsrathi closed 6 years ago

komalsrathi commented 6 years ago

Hi,

I have successfully been able to follow all steps from the chimeraviz manual until plotFusionTranscript. When I run it, it gives me the following error:

> plotFusionTranscript(fusion = fusion, 
+                      edb = edb, bamfile = bamfile)

Fetching transcripts for gene partners..
..transcripts fetched.
Selecting transcripts for RCC1..
..found transcripts of type exonBoundary
Selecting transcripts for HENMT1..
..found transcripts of type exonBoundary
Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file),  : 
  seqlevels(param) not in BAM header:
    seqlevels: '11'
    file: fusionAlignment.bam
    index: fusionAlignment.bam.bai

These are the steps I followed:

  1. Generate the reference fasta:
> fusion
[1] "Fusion object"
[1] "id: 22"
[1] "Fusion tool: fusioncatcher"
[1] "Genome version: hg19"
[1] "Gene names: RCC1-HENMT1"
[1] "Chromosomes: chr11-chr11"
[1] "Strands: -,-"
[1] "In-frame?: FALSE"

referenceFilename <- "reference.fa"
writeFusionReference(fusion = fusion, filename = referenceFilename)
rsubreadIndex(referenceFasta = referenceFilename)

This is how the reference fasta referenceFilename looks like:

# not sure why is it called chrNA
>chrNA
TGTATCACCTCGACTGCTTCGCCTGCCAGCTCTGCAACCAGAGGAAAATTGGGCCGATTTCCACCTATGATGCATCATCA
CCAGGC

These are first few lines of the fastq files:

$ head sample_R1.fq 

@NB501069:24:HY2VNBGXX:1:11101:20482:1153 1:N:0:CGATGT
CTGCAATCCTGACAGGGTCCTGCCACTTACCTGTCCCCACCACCCTCCCAACTTCTCTCAGGCTTGAGTGAGGCCTTCTGAAGTTGAAGGGCTTTCTGC
+
A/AAAEEEAE6EEEEEEEEAEEEAEEEEEEAEEEEEAA/A<EAEEEA<EEEAEEEE<EEEEEEEAEE/EEAA<6EEAAE<EEEA<EEAA<EAE<E/EAA
@NB501069:24:HY2VNBGXX:1:11101:24825:1900 1:N:0:CGATGT
GCCACATCGTCCTAACCTGGTAGAGTCAGCCCCCAGGTGATGCCCTAAACCTCCAGACATGGAGGCCCCTTCTAGGTCCTCAAGGGGTGAATCCCCAGGA
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEE
@NB501069:24:HY2VNBGXX:1:11101:5270:2388 1:N:0:CGATGT
CCCAAATCCTACGGCAAGCTTTGACAATGTAACATCTTTATCTTGTGGTTAGGAAAATGGACCTAGAGAGATTATGTGGTTTGCTCAAAATCACATAGCT

$ head sample_R2.fq

@NB501069:24:HY2VNBGXX:1:11101:20482:1153 2:N:0:CGATGT
CCTGGCCCTTAAACAACTGCAGAAAGCCCTTCAACTTCAGAAGGCCTCACTCAAGCCTGAGAGAAGTTGGGAGGGTGGTGGGGACAGGTAAGTGGCAGGAC
+
AAAAAEEEEEEEEE6EEEEEAEEE6/EA/EEEE/EEEEEEEEEEEEEEAEEE//EEEEEEE<E</AEEAE<EEEEEEEEEEEEE/AEE/EAEAE<//EE/<
@NB501069:24:HY2VNBGXX:1:11101:24825:1900 2:N:0:CGATGT
GCTGCAGTGAGCTGTGATTGCGCCACTGTACTCCAGCTTGGGTGCCAGAGCAAGACCCTGTCTCAAAAAAGAAAAGAATGTTCCTGGGGATTCACCCCTTG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAAEEEEEEEEEEEAEEEEEEEEEEEEE<<EEE<EEEEEEE6EEEE<
@NB501069:24:HY2VNBGXX:1:11101:5270:2388 2:N:0:CGATGT
GTTGGTCTTATAGAAAGCACTACTGCACTTAGTAGCTATGTGATTTTGAGCAAACCACATAATCTCTCTAGGTCCATTTTCCTAACCACAAGATAAAGATG
  1. Create bam file bamfile using rsubreadAlign:
rsubreadAlign(
  referenceName = referenceFilename,
  fastq1 = fastq1,
  fastq2 = fastq2,
  outputBamFilename = "fusionAlignment")

And the bam file looks like this - again not sure what is chrNA doing in the header or in the alignments:

@HD     VN:1.0  SO:coordinate
@SQ     SN:chrNA        LN:86
@PG     ID:subread      PN:subread      VN:Rsubread 1.26.1
NB501069:24:HY2VNBGXX:1:13107:9045:19192        163     chrNA   1       40      43M42S  =       1       -85     TGTATCACCTCGACTGCTTCGCCTGCCAGCTCTGCAACCAGAGATTTTGTGTGGGAGACAAATTCTTCCTGAAGAACAACATGAT   AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEAEEE   HI:i:1  NH:i:1  NM:i:0
NB501069:24:HY2VNBGXX:1:13107:9045:19192        83      chrNA   1       40      43M42S  =       1       85      TGTATCACCTCGACTGCTTCGCCTGCCAGCTCTGCAACCAGAGATTTTGTGTGGGAGACAAATTCTTCCTGAAGAACAACATGAT   EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA   HI:i:1  NH:i:1  NM:i:0
NB501069:24:HY2VNBGXX:2:13105:4497:1842 163     chrNA   2       40      42M58S  =       26      125     GTATCACCTCGACTGCTTCGCCTGCCAGCTCTGCAACCAGAGATTTTGTGTGGAGACAAATTCTTCCTGAAGAACAACATGATCTTGTGTCAGATGGACT    AAAAAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEAEEE<E    HI:i:1  NH:i:1  NM:i:0

Any help would be much appreciated. Thanks!!

stianlagstad commented 6 years ago

Thank you for your interest in chimeraviz! Could you by any chance share your input files? At first glance I think it might have to do with different chromosome naming ("chr11" vs. just "11"), though I can't investigate it further right now. I'll take a closer look as soon as I can.

Sidenote: The reason "chrNA" is used (chimeraviz sets this in the writeFusionReference function) is to make Gviz (the library used for most visualizations) happy. And since we're looking at the fusion sequence, which might be partly from different chromosomes, "chrNA" sort of makes sense :)

komalsrathi commented 6 years ago

Hi Stian,

Thanks for the response. Here is the google drive link with all the data - input fastq files, reference fasta generated by writeFusionReference, Ensembl GTF modified with "chr", bam files generated using writeFusionReference as well as fusion catcher output folder. Also attaching my R script in the drive so you can reproduce the output.

https://drive.google.com/open?id=0B-8gQV1WZcYdQXUyNFF6UEF1bnM

I am not sure what happened within the last couple of hours but now I am stuck at an upstream step which had worked for me before.

  1. For plotFusion, I got a plot like this before:
    plotFusion(
    fusion = fusion,
    bamfile = bamfile,
    edb = edb,
    nonUCSC = TRUE,
    reduceTranscripts = TRUE)
But now I am getting a plot like this which looks like some graphics issue: 2. The basic `plotTranscripts` function still looks ok: ``` plotTranscripts( fusion = fusion, edb = edb) ``` 3. But `plotTranscripts` with bam file input does not show any coverage: ``` plotTranscripts( fusion = fusion, edb = edb, bamfile = bamfile, nonUCSC = TRUE, reduceTranscripts = T) ``` 4. And `plotFusionTranscript` is giving me an error: ``` plotFusionTranscript(fusion = fusion, edb = edb, bamfile = bamfile) Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), : seqlevels(param) not in BAM header: seqlevels: 'chr11' file: fusionAlignment.bam index: fusionAlignment.bam.bai ``` This is my sessionInfo: ``` > sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats4 parallel stats graphics grDevices utils [8] datasets methods base other attached packages: [1] chimeraviz_1.2.2 ensembldb_2.0.4 AnnotationFilter_1.0.0 [4] GenomicFeatures_1.28.5 AnnotationDbi_1.38.2 Biobase_2.36.2 [7] Gviz_1.20.0 GenomicRanges_1.28.6 GenomeInfoDb_1.12.3 [10] Biostrings_2.44.2 XVector_0.16.0 IRanges_2.10.5 [13] S4Vectors_0.14.7 BiocGenerics_0.22.1 BiocInstaller_1.26.1 loaded via a namespace (and not attached): [1] ProtGenerics_1.8.0 bitops_1.0-6 [3] matrixStats_0.52.2 bit64_0.9-7 [5] RColorBrewer_1.1-2 httr_1.3.1 [7] rprojroot_1.2 tools_3.4.2 [9] backports_1.1.1 R6_2.2.2 [11] DT_0.2 rpart_4.1-11 [13] Hmisc_4.0-3 DBI_0.7 [15] lazyeval_0.2.0 colorspace_1.3-2 [17] nnet_7.3-12 gridExtra_2.3 [19] bit_1.1-12 curl_3.0 [21] compiler_3.4.2 htmlTable_1.9 [23] DelayedArray_0.2.7 rtracklayer_1.36.5 [25] scales_0.5.0 checkmate_1.8.4 [27] readr_1.1.1 RCircos_1.2.0 [29] stringr_1.2.0 digest_0.6.12 [31] Rsamtools_1.28.0 foreign_0.8-69 [33] rmarkdown_1.6 base64enc_0.1-3 [35] dichromat_2.0-0 pkgconfig_2.0.1 [37] htmltools_0.3.6 BSgenome_1.44.2 [39] htmlwidgets_0.9 rlang_0.1.2 [41] RSQLite_2.0 shiny_1.0.5 [43] BiocParallel_1.10.1 acepack_1.4.1 [45] VariantAnnotation_1.22.3 RCurl_1.95-4.8 [47] magrittr_1.5 GenomeInfoDbData_0.99.0 [49] Formula_1.2-2 Matrix_1.2-11 [51] Rcpp_0.12.13 munsell_0.4.3 [53] stringi_1.1.5 yaml_2.1.14 [55] SummarizedExperiment_1.6.5 zlibbioc_1.22.0 [57] org.Hs.eg.db_3.4.1 plyr_1.8.4 [59] AnnotationHub_2.8.2 blob_1.1.0 [61] lattice_0.20-35 splines_3.4.2 [63] hms_0.3 knitr_1.17 [65] biomaRt_2.32.1 XML_3.98-1.9 [67] evaluate_0.10.1 biovizBase_1.24.0 [69] latticeExtra_0.6-28 data.table_1.10.4-1 [71] httpuv_1.3.5 gtable_0.2.0 [73] ggplot2_2.2.1 mime_0.5 [75] xtable_1.8-2 survival_2.41-3 [77] tibble_1.3.4 GenomicAlignments_1.12.2 [79] memoise_1.1.0 cluster_2.0.6 [81] interactiveDisplayBase_1.14.0 BiocStyle_2.4.1 ```
stianlagstad commented 6 years ago

Thank you for providing everything!:) I will look at this over the weekend.

stianlagstad commented 6 years ago

I think the problem is the bamfile you're using: bamfile <- 'fusionAlignment.bam'. This is the bamfile containing reads aligned to the sequence around the fusion breakpoint. This is the correct bamfile to use for the plotFusionReads plot. But the bamfile you need for the plotFusion, plotTranscripts, and plotFusionTranscript is the original bamfile used with the fusion finder (i.e. the one with all sequencing reads aligned to the whole genome). Does that make any sense?

And BTW: This is a perfect timing for any suggestions on how I can make chimeraviz better or more intuitive, since a new Bioconductor release is just around the corner. Any feedback is welcome!:)

stianlagstad commented 6 years ago

Can this too be closed, @komalsrathi?

komalsrathi commented 6 years ago

I am testing this today and will close this asap. Thanks!!

On Wed, Oct 25, 2017 at 11:25 AM Stian Lågstad notifications@github.com wrote:

Can this too be closed, @komalsrathi https://github.com/komalsrathi?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stianlagstad/chimeraviz/issues/5#issuecomment-339368165, or mute the thread https://github.com/notifications/unsubscribe-auth/AGrSJ-dgGHAffF7pGkT_VnqpRHO0xNO0ks5sv1L5gaJpZM4P2bV5 .

-- Komal S Rathi | Bioinformatics Scientist II, DBHi, The Children's Hospital of Philadelphia | rathik@email.chop.edu

komalsrathi commented 6 years ago

I am closing this as I did not get this error with the latest test I did on a different sample. Thanks for all the help!