Open ivirshup opened 1 year ago
It would be useful to write down what exactly the differences between EnsDB and TxDB are.
To play around with this:
ibis
import genomic_features as gf
import ibis
!wget https://bioconductorhubs.blob.core.windows.net/annotationhub/ucsc/standard/3.15/TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite
ensdb = gf.ensembl.annotation(species="Hsapiens", version="108").db
ucscdb = ibis.connect("TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite")
for tbl_name in ensdb.list_tables():
print(tbl_name, ensdb.table(tbl_name).schema())
for tbl_name in ucscdb.list_tables():
print(tbl_name, ucscdb.table(tbl_name).schema())
It does look like the UCSC sqlite files carry less information.
It's probably worth looking into how the bioconductor packages deal with having two different schemas. E.g. do they subclass, are the annotation filters aware?
cc: @nvictus
Re discussion about nonstandard chromosome names @nvictus: https://github.com/jorainer/ensembldb/issues/88
Description of feature
Getting UCSC data from TxDB bioconductor sources