scverse / genomic-features

Genomic Features in Python from BioConductor's AnnotationHub
https://genomic-features.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Transcription Factor column #45

Open james-cranley opened 1 year ago

james-cranley commented 1 year ago

Description of feature

Hi, this is a really useful tool. A simple but useful addition would be to annotate whether a gene/transcript is a Transcription Factor or not.

At the moment I do this using the table (attached) from this paper . This is human specific. There are probably other dbs for other species, maybe ensembl?

Filtering the genes output using the description is not capturing this well. For instance,genes[genes.description.str.contains('transcription', na=False, case=False)] gives me 705 genes, whereas filtering the table I attached by Is TF? == 'Yes' gives me over 1600 genes.

1-s2.0-S0092867418301065-mmc2.xlsx

ivirshup commented 1 year ago

Thanks for the suggestion!

I'm not sure this is going to be in scope for the library though, since we're really just getting data from the annotation hub ensembl sqlite tables.

Maybe this could be an extension that directly talks to ensembl-biomart, but I'm not sure if they have an exact equivalent for the list you've suggested. Maybe there's a GO term that fits here?