Closed zyh4482 closed 1 month ago
I tried with current Bioconductor package versions (3.19) and Ensembl release 112 and it seems to work:
library(ensembldb)
library(AnnotationHub)
edb <- ah[["AH116860"]]
rng <- IRanges(start = 1, end = 28)
names(rng) <- "ENSP00000518892"
proteinToGenome(rng, edb)
Fetching CDS for 1 proteins ... 1 found
Checking CDS and protein sequence lengths ... 1/1 OK
$ENSP00000518892
GRanges object with 1 range and 7 metadata columns:
seqnames ranges strand | protein_id tx_id
<Rle> <IRanges> <Rle> | <character> <character>
[1] 5 179644937-179645020 - | ENSP00000518892 ENST00000511063
exon_id exon_rank cds_ok protein_start protein_end
<character> <integer> <logical> <integer> <integer>
[1] ENSE00003874488 1 TRUE 1 28
-------
seqinfo: 1 sequence from GRCh38 genome
maybe you used a different Ensembl release?
I stumbled over this yesterday too: The reason was that it wasn't obvious to me that the ranges need to have protein ID names.
The names are set in the Examples section of ?proteinToGenome
, but it would have saved me a bit of time if either:
IRanges
object with positive length and no names would raise an errorthanks for the comment @mschubert ! When you write
was stated in the Description field
do you mean the help page of the proteinToGenome()
function?
Regarding the second point, yes that makes sense. I'll look into it - and see if that has no other unwanted effects.
@mschubert This is quite unexpected... I've never thought about this before. Thank you!
do you mean the help page of the
proteinToGenome()
function?
Yes, exactly. I see you already added that, great, thank you!
I'm closing this issue now - feel free to reopen if needed.
As an example, I would like to map the 1-28 amino acid of
ENSP00000518892
back to its genomic coordinates. According tobiomaRt
query result, the respective transcript sequence should be:The peptide sequence is as below:
But I cannot get this query from the
proteinToGenome
output. It returns an empty record:May I ask why this happens? I don't understand why the sequence can all be found in the Ensembl database but cannot be mapped using
proteinToGenome
.Thank you.