Open mbrush opened 9 years ago
also, i found this page useful to describe the different banding patterns: http://www.pathology.washington.edu/galleries/Cytogallery/main.php?file=banding%20patterns
Here's an example of a thing in NCBI: http://www.ncbi.nlm.nih.gov/gene/4384, which has the type of "unknown" in the DB. it is "located" on Xp11-q21.
there is the concept of fuzzy positions in faldo, and perhaps that is what is needed here?
I think fuzzy locations in faldo are to represent fuzzy locations in genbank, which would not generally be used here.
Here we could represent a region that is starts on Xp11 and ends on q21, and say that the gene's interval is part of this interval. But perhaps it's just fine to make a more general statement - it's located on X. How long will these genes remain unlocated?
On 18 Mar 2015, at 16:00, Nicole Washington wrote:
Here's an example of a thing in NCBI: http://www.ncbi.nlm.nih.gov/gene/4384, which has the type of "unknown" in the DB. it is "located" on Xp11-q21.
there is the concept of fuzzy positions in faldo, and perhaps that is what is needed here?
Reply to this email directly or view it on GitHub: https://github.com/monarch-initiative/dipper/issues/58#issuecomment-83218791
another example are some omim diseases that are annotated to genomic regions.
for example, http://omim.org/entry/101850 is known to map to a broad region of 2p25-p12.
clearly some feature lies within the region defined by chr2p25 and chr2p12. but faldo:positions are not to regions. also, the current pattern we've been using is to say that a given X (region) is a subsequence of Y (chromosome band). but that doesn't seem right here.
similarly.... for some sequence variants, we only know the gene that they map to. this is the case for all variants from zfin.
@mbrush can we close this?
Some data sources provide only very broad location information about a sequence alteration (i.e at the level of a chromosome region instead of within a specific gene/marker). @nlwashington can provide examples from the data here.
How should we capture this information in a genotype graph? If we treat the location as just a very large marker, we can capture it the same way as we capture marker based locations by linking the alteration to the marker of which it is a sequence-variant. Then in our genotype syntax we would need some convention for labeling this broad 'marker' in a given variant locus. Triples might look like:
Note that we are creating classes for chromosomal regions/bands (e.g. mmusChr11p) as per #42 and #43.
While the above is one option, I don't think it is practical or useful to define a 'marker' that spans an entire chromosome or band. I would prefer here to forgo creation of the variant locus level in the genotype graph, and link the sequence alteration to its broad chromosomal location in a new triple. The triple might look like:
Here the object property could be is_subsequence_of or is_variant_part_of (the latter being used if we want to propagate phenotypes over this link). In its implementation, this triple will pun the chromosomal region class (it links an instance IRI to a class IRI).