Open jorainer opened 7 years ago
I'm snowed under and travelling for 2 coming weeks - will get back to you after that.
Wouldn't it make sense to define a PRanges object that is similar to the GRanges? Could be just as the GRanges but without strand and with the seqnames being the ID of the protein and the IRanges defining the start and end of the region within the protein sequence. The Proteins object could contain such objects in pranges.
@jotsetung that a great suggestion. That was our first intention when we started Pbase. To be honest I don't remember why we stopped implementing it.
Also I was wondering whether it wouldn't be wise to start a new package, e.g. ProteinRanges
(analogy to the GenomicRanges
package) that is centered around the PRanges
object. Pbase
could import stuff from there, same as e.g. ensembldb
could. Pbase
has so many dependencies that it is hard to implement anything from it in other packages and don't run into cyclic dependencies (I can e.g. not import anything from Pbase
in ensembldb
because of that). Also package loading time is quite long.
That's eventually something we might think/discuss over in Cambridge at the EuroBioc2017 - unfortunately I can not attend the SIG on Monday :(
You are right. Pbase
is already a heavy packages. To follow the bioconductor naming scheme a ProteinRanges
package would be great. I am looking forward to meet you and @lgatto at EuroBioc2017!
BTW: for the topdownr
package I used a similar class called FragmentViews
https://github.com/sgibb/topdownr/blob/bc188d39173c3cd94489ddcf5eafaf885c45cf29/R/AllClasses.R#L1-L32.
It is just an overloaded XStringViews
. XStringViews
is an XString
(dna/peptide sequence) combined with an IRanges
object.
After thinking about it for a while I assume that we avoided creating a PRanges
class because it would be just an IRanges
/IRangesList
(the id could be part of the metadata
slot).
Good point - I'll check the IRanges
- didn't think of the metadata column.
Dear @lgatto and @sgibb , I went ahead and added coordinate mapping functionality to ensembldb
(specifically a function proteinToGenome
). See issue #46 for a use case - it fixes also the problem described in issue #46.
Now, since the original code was based on your work, is it OK if I add you two as contributors to ensembldb
?
I'd also like to implement (if time permits) other functions, such as proteinToTranscript
(map regions within a protein to the region within the encoding transcript's CDS) and genomeToProtein
(map a genomic region to the region within a protein).
Related to #46 I was thinking:
ensembldb
? Somehow that would make sense, since all the data is there and all the functionality to work with genomic annotations.PRanges
object that is similar to theGRanges
? Could be just as theGRanges
but without strand and with theseqnames
being the ID of the protein and theIRanges
defining the start and end of the region within the protein sequence. TheProteins
object could contain such objects inpranges
.mapToGenome
function forPRanges
,EnsDb
. This would facilitate mapping regions within proteins to the genome, since the current way via theProteins
object is not that straight forward.