thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
66 stars 11 forks source link

Extrating/adding gene_name to transcript level data #44

Closed kvittingseerup closed 4 years ago

kvittingseerup commented 4 years ago

I would like to add the gene_name to the transcript level RangedSummarizedExperiment.

This data should be available in the linked transcriptome since it is automatically added when using summarizeToGene().

Is there an easy way to do it?

Cheers Kristoffer

mikelove commented 4 years ago

I checked for Ensembl and GENCODE cases and the gene_id was present both times in the transcript-level mcols(se). What is your organism/source where it isn't present? I am thinking to modify addIds such that it can pull from the TxDb/EnsDb.

mikelove commented 4 years ago

I tried to address this with a new argument fromDb in addIds(), so that you can access the underlying TxDb/EnsDb. Let me know if this addresses the issue:

004673b

kvittingseerup commented 4 years ago

I was after the gene_name (aka gene_symbol) - not the gene_id. This should have been easy to obtain using retrieveDb() followed by a select(). Turns out it is not - according to this It looks like the the TxDB GenomicFeatures does not support the import of gene_names in the TxDB object (but the ensembldb TxDB does).

So the problem was Bioconductor not tximeta - sorry for the trouble - just assumed gene_names would be available (and it is for the example data due to it being based on Ensembl).

mikelove commented 4 years ago

Ah, got it. Well still i think it was a good push to allow more custom access via addIds. Thanks for the feedback

kvittingseerup commented 4 years ago

If you know the BioC people feel free to tell them people want gene_names ;-)