scverse / genomic-features

Genomic Features in Python from BioConductor's AnnotationHub
https://genomic-features.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Defining the main genomic coordinates for each table #33

Open emdann opened 1 year ago

emdann commented 1 year ago

Description of feature

For each table available via EnsemblDb (e.g. genes, promoters) we could rename the primary genomic coordinates to the format required by bioframe (e.g. chrom, start, end). This would (a) mimic the behaviour in bioconductor (where ranges make the IRange column) and (b) allow selecting columns with additional coordinates in the ibis query (e.g. add tx_seq_start column when running genes #9 ).

nvictus commented 3 months ago

You can be flexible with bioframe using the cols arguments.

bioframe.overlap(genes, df2, cols1=["seq_name", "gene_seq_start", "gene_seq_end"])
ivirshup commented 3 months ago

It would be nice if you didn't have to though.

ensembldb and GenomicFeatures both return a GRanges which contains seqnames, ranges, and strand, for the entity being queries (e.g. genes, transcripts). This is nice because it can be directly used with genomic range libraries.

We could also provide (maybe with opt-out) the main entity's features in a way that will automatically work with bioframe.