Closed cjagla closed 10 months ago
The example code works because the full genomic region is within the transcript (i.e. one of its exons): see also the respective region in the Ensembl genome browser here.
Your region defines a large genomic region that contains a full transcript and also downstream sequence (see here). The functions to map between genomic and transcript or protein coordinates (genomeToTranscript
genomeToProtein
) can only map genomic regions that are within an exon of a transcript to the respective coordinates within that transcript (i.e. convert between genomic coordinates and transcript-relative coordinates).
So, these functions can only be used to convert between these coordinate systems. Genomic coordinates that do not code for an exon of a transcript can not be converted to transcript coordinates.
If instead you were looking for a way to identify transcripts that are located in a certain genomic region:
> gnm <- GRanges("6:88109864-88166347")
> transcripts(edb, filter = GRangesFilter(gnm))
GRanges object with 7 ranges and 11 metadata columns:
seqnames ranges strand | tx_id
<Rle> <IRanges> <Rle> | <character>
ENST00000362094 6 88139864-88145274 - | ENST00000362094
ENST00000468898 6 88139864-88145274 - | ENST00000468898
... ... ... ... . ...
ENST00000369501 6 88139864-88166347 - | ENST00000369501
ENST00000551417 6 88139864-88166347 - | ENST00000551417
tx_biotype tx_cds_seq_start tx_cds_seq_end
<character> <integer> <integer>
ENST00000362094 protein_coding 88144997 88145274
ENST00000468898 protein_coding 88143856 88145274
... ... ... ...
ENST00000369501 protein_coding 88143856 88145274
ENST00000551417 protein_coding 88143856 88145274
gene_id tx_support_level tx_id_version gc_content
<character> <integer> <character> <numeric>
ENST00000362094 ENSG00000118432 1 ENST00000362094.6 40.5797
ENST00000468898 ENSG00000118432 1 ENST00000468898.2 40.7568
... ... ... ... ...
ENST00000369501 ENSG00000118432 1 ENST00000369501.3 43.8611
ENST00000551417 ENSG00000118432 5 ENST00000551417.2 44.0806
tx_external_name tx_is_canonical tx_name
<character> <integer> <character>
ENST00000362094 CNR1-201 0 ENST00000362094
ENST00000468898 CNR1-205 0 ENST00000468898
... ... ... ...
ENST00000369501 CNR1-203 1 ENST00000369501
ENST00000551417 CNR1-207 0 ENST00000551417
-------
seqinfo: 1 sequence from GRCh38 genome
Okay, thanks for the explanation!
Hi, I'm trying to use
genomeToTranscript()
to convert exon coordinates from genomic to transcript-based for specific human genes. I can't get any examples of genomic coordinates to work other than the one you've provided in the function manual, including a very large chunk of chromosome 6 that should definitely contain many genes or the entirety of chromosome 19. Can you please assist?`> a[["9606"]] EnsDb for Ensembl: |Backend: SQLite |Db type: EnsDb |Type of Gene ID: Ensembl Gene ID |Supporting package: ensembldb |Db created by: ensembldb package from Bioconductor |script_version: 0.3.10 |Creation time: Thu Feb 16 12:36:05 2023 |ensembl_version: 109 |ensembl_host: localhost |Organism: Homo sapiens |taxonomy_id: 9606 |genome_build: GRCh38 |DBSCHEMAVERSION: 2.2 |common_name: human |species: homo_sapiens | No. of genes: 70623. | No. of transcripts: 276218. |Protein data available.
Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.21.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] ensembldb_2.22.0 AnnotationFilter_1.22.0 GenomicFeatures_1.50.4 AnnotationDbi_1.60.2 Biobase_2.58.0 GenomicRanges_1.50.2
[7] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2 AnnotationHub_3.6.0 BiocFileCache_2.6.1 dbplyr_2.3.3
[13] BiocGenerics_0.44.0 stringr_1.5.0 dplyr_1.1.2 jsonlite_1.8.7 readr_2.1.4 tidyr_1.3.0
[19] xml2_1.3.5 biomaRt_2.54.1
loaded via a namespace (and not attached): [1] ProtGenerics_1.30.0 bitops_1.0-7 matrixStats_1.0.0 bit64_4.0.5 filelock_1.0.2
[6] progress_1.2.2 httr_1.4.7 tools_4.2.2 utf8_1.2.3 R6_2.5.1
[11] DBI_1.1.3 lazyeval_0.2.2 withr_2.5.0 tidyselect_1.2.0 prettyunits_1.1.1
[16] bit_4.0.5 curl_5.0.2 compiler_4.2.2 cli_3.6.1 DelayedArray_0.24.0
[21] rtracklayer_1.58.0 rappdirs_0.3.3 digest_0.6.33 Rsamtools_2.14.0 XVector_0.38.0
[26] pkgconfig_2.0.3 htmltools_0.5.6 MatrixGenerics_1.10.0 fastmap_1.1.1 rlang_1.1.1
[31] RSQLite_2.3.1 shiny_1.7.5 BiocIO_1.8.0 generics_0.1.3 BiocParallel_1.32.6
[36] RCurl_1.98-1.12 magrittr_2.0.3 GenomeInfoDbData_1.2.9 Matrix_1.5-1 Rcpp_1.0.11
[41] fansi_1.0.4 lifecycle_1.0.3 stringi_1.7.12 yaml_2.3.7 SummarizedExperiment_1.28.0
[46] zlibbioc_1.44.0 grid_4.2.2 blob_1.2.4 parallel_4.2.2 promises_1.2.1
[51] crayon_1.5.2 lattice_0.20-45 Biostrings_2.66.0 hms_1.1.3 KEGGREST_1.38.0
[56] pillar_1.9.0 rjson_0.2.21 codetools_0.2-18 XML_3.99-0.14 glue_1.6.2
[61] BiocVersion_3.16.0 renv_1.0.2 BiocManager_1.30.22 png_0.1-8 vctrs_0.6.3
[66] tzdb_0.4.0 httpuv_1.6.11 purrr_1.0.2 cachem_1.0.8 mime_0.12
[71] xtable_1.8-4 restfulr_0.0.15 later_1.3.1 tibble_3.2.1 GenomicAlignments_1.34.1
[76] memoise_2.0.1 ellipsis_0.3.2 interactiveDisplayBase_1.36.0`