scverse / genomic-features

Genomic Features in Python from BioConductor's AnnotationHub
https://genomic-features.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

`genes`, `transcripts`, `exons` functions #12

Closed ivirshup closed 1 year ago

ivirshup commented 1 year ago

Description of feature

ivirshup commented 1 year ago

I was looking into how to do exons, and the joins get a bit complicated, since we often need to go through tx2exon. ensembldb handles this by specifying how joins through particular tables happen:

https://github.com/jorainer/ensembldb/blob/e74f58e0f853d6b00e4fa079bea3575c31bf1962/R/dbhelpers.R#L169-L245

It starts with a list of tables, and greedily passes through trying to find a way to join in each table into the query. E.g. if you know you need exon and tx2exon, it looks up how to join these tables in the .JOINS2 variable. It knows that you need tx2exon to get from exon to anything else from addRequiredTables, specifically:

    ## if we have exon and any other table, we need definitely tx2exon
    if(any(tab=="exon") & length(tab) > 1){
        tab <- unique(c(tab, "tx2exon"))
    }
    ## if we have exon and we have gene, we'll need also tx
    if((any(tab=="exon") | (any(tab=="tx2exon"))) & any(tab=="gene")){
        tab <- unique(c(tab, "tx"))
    }

https://github.com/jorainer/ensembldb/blob/e74f58e0f853d6b00e4fa079bea3575c31bf1962/R/dbhelpers.R#L268-L282