jorainer / ensembldb

This is the ensembldb development repository.
https://jorainer.github.io/ensembldb
33 stars 10 forks source link

Error when converting from genomic to transcriptomic coordinates #93

Closed marcasriv closed 5 years ago

marcasriv commented 5 years ago

Hi,

I am trying to convert from genomic coordinates to transcriptomic coordinates for Cricetulus griseus Ensembl genome V.95. An IRanges object for Fut8 & Mdm2 was created as follows: library(EnsDb.Cgriseus.v95) edb<-EnsDb.Cgriseus.v95 mdm2_genome_position <- GRanges( IRanges(start = c(570169, 2972208), end = c(731500, 2991413)), seqnames = c("JH000281.1","JH000063.1")) And then the conversion was called: mdm2_transcript_position <- genomeToTranscript(mdm2_genome_position, edb) However, I keep getting this error:

Warning messages: 1: In if (width(ints) == width(genome)) { : the condition has length > 1 and only the first element will be used 2: In if (width(ints) == width(genome)) { : the condition has length > 1 and only the first element will be used 3: 2 genomic region(s) could not be mapped to a transcript; hint: see ?seqlevelsStyle if you used UCSC chromosome names

However, if I try to convert from transcriptomic to genomic coordinates, it works.

Here is some info about the R session:

sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.18.so locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] EnsDb.Cgriseus.v95_1.0 ensembldb_2.6.3 AnnotationFilter_1.6.0 [4] GenomicFeatures_1.34.1 AnnotationDbi_1.44.0 Biobase_2.42.0 [7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0 [10] S4Vectors_0.20.1 BiocGenerics_0.28.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.0 compiler_3.5.2 [3] XVector_0.22.0 ProtGenerics_1.14.0 [5] prettyunits_1.0.2 bitops_1.0-6 [7] tools_3.5.2 progress_1.2.0 [9] zlibbioc_1.28.0 biomaRt_2.38.0 [11] digest_0.6.18 bit_1.1-14 [13] lattice_0.20-38 RSQLite_2.1.1 [15] memoise_1.1.0 pkgconfig_2.0.2 [17] rlang_0.3.1 Matrix_1.2-15 [19] DelayedArray_0.8.0 DBI_1.0.0 [21] curl_3.3 GenomeInfoDbData_1.2.0 [23] rtracklayer_1.42.1 stringr_1.3.1 [25] httr_1.4.0 Biostrings_2.50.2 [27] hms_0.4.2 grid_3.5.2 [29] bit64_0.9-7 R6_2.3.0 [31] XML_3.98-1.16 BiocParallel_1.16.5 [33] blob_1.1.1 magrittr_1.5 [35] matrixStats_0.54.0 Rsamtools_1.34.0 [37] GenomicAlignments_1.18.1 SummarizedExperiment_1.12.0 [39] assertthat_0.2.0 stringi_1.2.4 [41] lazyeval_0.2.1 RCurl_1.95-4.11 [43] crayon_1.3.4

Thank you!

Marina

jorainer commented 5 years ago

Thanks for reporting @marcasriv . I'll have a look at it.

jorainer commented 5 years ago

Some observations from your example:

1) the genomic regions you define are pretty large (see below). genomeToTranscript will only map coordinates that are completely within a transcript to transcript-relative positions. A genonic position would have to be completely within the exon of a transcript to be mapped.

mdm2_genome_position <- GRanges( IRanges(start = c(570169, 2972208), end = c(731500, 2991413)), seqnames = c("JH000281.1","JH000063.1")) 
width(mdm2_genome_position)
[1] 161332  19206

With smaller ranges it works:

mdm_rgns <- GRanges( IRanges(start = c(570169, 2972208), end = c(570180, 2972220)), seqnames = c("JH000281.1","JH000063.1"))
IRangesList of length 2
[[1]]
IRanges object with 1 range and 6 metadata columns:
                         start       end     width |            exon_id
                     <integer> <integer> <integer> |        <character>
  ENSCGRT00000018020         1        12        12 | ENSCGRE00000137723
                     exon_rank seq_start   seq_end    seq_name  seq_strand
                     <integer> <integer> <integer> <character> <character>
  ENSCGRT00000018020         1    570169    570180  JH000281.1           *

[[2]]
IRanges object with 1 range and 6 metadata columns:
                         start       end     width |            exon_id
                     <integer> <integer> <integer> |        <character>
  ENSCGRT00000025022         1        13        13 | ENSCGRE00000194789
                     exon_rank seq_start   seq_end    seq_name  seq_strand
                     <integer> <integer> <integer> <character> <character>
  ENSCGRT00000025022         1   2972208   2972220  JH000063.1           *

>

2) Regarding the warnings, I have already a fix for that in the current developmental release. I just have to push that also to the release.

jorainer commented 5 years ago

I've pushed the fix for the warnings above to Bioconductor. ensembldb version 2.6.4 (includes the fix) should be available in the next couple of days. Alternatively, you can install the version directly from github:

devtools::install_github("jotsetung/ensembldb", ref = "RELEASE_3_8")
jorainer commented 5 years ago

Closing as it seems to be solved.