lgatto / Pbase

Manipluating and exploring protein and proteomics data
8 stars 3 forks source link

Add PRanges and move part of functionality to ensembldb? #47

Open jorainer opened 7 years ago

jorainer commented 7 years ago

Related to #46 I was thinking:

lgatto commented 7 years ago

I'm snowed under and travelling for 2 coming weeks - will get back to you after that.

sgibb commented 7 years ago

Wouldn't it make sense to define a PRanges object that is similar to the GRanges? Could be just as the GRanges but without strand and with the seqnames being the ID of the protein and the IRanges defining the start and end of the region within the protein sequence. The Proteins object could contain such objects in pranges.

@jotsetung that a great suggestion. That was our first intention when we started Pbase. To be honest I don't remember why we stopped implementing it.

jorainer commented 7 years ago

Also I was wondering whether it wouldn't be wise to start a new package, e.g. ProteinRanges (analogy to the GenomicRanges package) that is centered around the PRanges object. Pbase could import stuff from there, same as e.g. ensembldb could. Pbase has so many dependencies that it is hard to implement anything from it in other packages and don't run into cyclic dependencies (I can e.g. not import anything from Pbase in ensembldb because of that). Also package loading time is quite long.

That's eventually something we might think/discuss over in Cambridge at the EuroBioc2017 - unfortunately I can not attend the SIG on Monday :(

sgibb commented 7 years ago

You are right. Pbase is already a heavy packages. To follow the bioconductor naming scheme a ProteinRanges package would be great. I am looking forward to meet you and @lgatto at EuroBioc2017!

sgibb commented 7 years ago

BTW: for the topdownr package I used a similar class called FragmentViews https://github.com/sgibb/topdownr/blob/bc188d39173c3cd94489ddcf5eafaf885c45cf29/R/AllClasses.R#L1-L32. It is just an overloaded XStringViews. XStringViews is an XString (dna/peptide sequence) combined with an IRanges object.

After thinking about it for a while I assume that we avoided creating a PRanges class because it would be just an IRanges/IRangesList (the id could be part of the metadata slot).

jorainer commented 7 years ago

Good point - I'll check the IRanges - didn't think of the metadata column.

jorainer commented 6 years ago

Dear @lgatto and @sgibb , I went ahead and added coordinate mapping functionality to ensembldb (specifically a function proteinToGenome). See issue #46 for a use case - it fixes also the problem described in issue #46. Now, since the original code was based on your work, is it OK if I add you two as contributors to ensembldb?

I'd also like to implement (if time permits) other functions, such as proteinToTranscript (map regions within a protein to the region within the encoding transcript's CDS) and genomeToProtein (map a genomic region to the region within a protein).