jorainer / ensembldb

This is the ensembldb development repository.
https://jorainer.github.io/ensembldb
33 stars 10 forks source link

Filter by CANONICAL transcript #109

Closed pantastheo closed 3 years ago

pantastheo commented 3 years ago

It's more of a feature request than an actual issue.

I was wondering if its possible to add an extra metadata column to the IRanges object that will flag if the transcript is canonical or MANE select? Or maybe add it as a filter option in genomeToTranscript()

eg. gnm_tx <- genomeToTranscript(gnm, edbx, canonical=T)

or even better edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ canonical = T)

Thanks a lot in advance!

jorainer commented 3 years ago

That's a good idea indeed. On what database columns/annotation field would you consider to filter then, or in other words, how would you define if a transcript is canonical?

pantastheo commented 3 years ago

There is a field/column 'canonical_transcript_id' in Ensembl MySQL gene table that will lead to the canonical ENST transcript in the transcript table. Not sure if that helps?!

jorainer commented 3 years ago

If possible I would like to avoid adding another column from the ensembl MySQL database to the EnsDb databases - I would then have to rebuild all databases that are in AnnotationHub. I could however add that for the next Ensembl release (given that the Ensembl Perl API allows to retrieve that field for a gene).

Note also that there is also a TxSupportLevelFilter, so you could filter an EnsDb keeping only transcripts with a support level e.g. of 1: edb <- filter(edb, filter = ~ tx_support_level == 1).

jorainer commented 3 years ago

Note: I've updated ensembldb to add a new field canonical_transcript to the gene table - this column will then be available in genes in all EnsDb databases that have been created with that new version of ensembldb - this means all databases from Ensembl release 102 (the next release) on.

pantastheo commented 3 years ago

That sounds awesome! Thank for adding that feature so fast. Great work! 👍

jorainer commented 3 years ago

Thanks! I'm closing this issue now - feel free to re-open. Note that the respective data will only be available from the next Ensembl release on.

jorainer commented 3 years ago

@pantastheo , do you have a Twitter handle? I'm about to add the Ensembl release 102 EnsDb databases to AnnotationHub and would include you in the tweet about that (since these new databases contain the canonical transcript ID for each gene).