Open emdann opened 1 year ago
You can be flexible with bioframe using the cols
arguments.
bioframe.overlap(genes, df2, cols1=["seq_name", "gene_seq_start", "gene_seq_end"])
It would be nice if you didn't have to though.
ensembldb
and GenomicFeatures
both return a GRanges
which contains seqnames
, ranges
, and strand
, for the entity being queries (e.g. genes
, transcripts
). This is nice because it can be directly used with genomic range libraries.
We could also provide (maybe with opt-out) the main entity's features in a way that will automatically work with bioframe.
Description of feature
For each table available via EnsemblDb (e.g.
genes
,promoters
) we could rename the primary genomic coordinates to the format required bybioframe
(e.g.chrom
,start
,end
). This would (a) mimic the behaviour in bioconductor (where ranges make the IRange column) and (b) allow selecting columns with additional coordinates in the ibis query (e.g. addtx_seq_start
column when runninggenes
#9 ).