monarch-initiative / gpsea

A Python library for discovery of genotype-phenotype associations
https://monarch-initiative.github.io/gpsea/stable
MIT License
5 stars 1 forks source link

Region predicate #272

Closed pnrobinson closed 2 weeks ago

pnrobinson commented 1 month ago

The documentation should have a fuller example. Also, we should make this one-based, because 100% of the time users will have a publication, table, or figure with one-based notation. In any case, the documentation needs to say what numbering scheme we expect.

See

https://monarch-initiative.github.io/gpsea/stable/apidocs/gpsea.analysis.predicate.genotype.html#gpsea.analysis.predicate.genotype.VariantPredicates.region

pnrobinson commented 1 month ago

For instance, these are the one-based inclusive numbers we get from a typical publication

- Suppresor domain (1-223)
- IP3 binding: 226-578
- Regulatory/Coupling: 605-2217
- Channel: 2227-2758 

If we use UCSC then we want to have say 225-578 for the IP3 region if the current Region zero-based scheme is used? It also seems that now we need to do this

from gpsea.analysis.predicate.genotype import VariantPredicates
from gpsea.model.genome import Region
region_pred = VariantPredicates.region(region=Region(start=225, end=578),from gpsea.analysis.predicate.genotype import VariantPredicates
from gpsea.model.genome import Region
region_pred = VariantPredicates.region(region=Region(start=225, end=578), tx_id=...)

but there is no advantage in exposing the Region class for users? Can we do this

from gpsea.analysis.predicate.genotype import VariantPredicates
region_pred = VariantPredicates.region(start=226, end=578, tx_id=...)
ielis commented 1 month ago

@pnrobinson yes, there is very little advantage in exposing Region, and we should indeed go for 1-based coordinates, since they are less mind boggling.