thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
66 stars 11 forks source link

Request: addCDS() function #42

Closed kvittingseerup closed 4 years ago

kvittingseerup commented 4 years ago

Would it be possible for tximeta to also support extracting the CDS regions associated with each transcript? Would be fairly straight forward since both TxDb and EnsDb support the cdsBy() function.

Cheers Kristoffer

mikelove commented 4 years ago

Interesting idea. I’m not sure where to put the regions in the SE. If they replace the rowRanges, there are transcripts that don’t have a CDS so what to do for those?

mikelove commented 4 years ago

I suppose the GRanges could go in mcols(se)?

kvittingseerup commented 4 years ago

Guess it could be either

    • A GRangeList (like exons) with NA for non-coding as rowRanges(se)
    • Extending rowData(se) / mcols(se) with the genomic start/end coordinates (then people can obtain the exon level CDS data by using GenomicRanges::restrict() on the result of addExons and the newly added genomic coordinates )
    • Or as you suggest add a GRangeList to the rowData(se) / mcols(se) data.

I think I like 1 or 3 better...?

mikelove commented 4 years ago

Thanks! I’ll take a look at implementing ASAP

mikelove commented 4 years ago

I took a shot at adding this in 95c94af

Let me know if it works on your end. I put in some important details in the man page, because GRangesList cannot have NA, I use the original ranges to fill in the empty slots for the non-coding transcripts, and then I'm just noisy about this in messaging.

kvittingseerup commented 4 years ago

Works beautifully :D