Open nataled opened 7 years ago
Hi @nataled. Formulating some thoughts here - will reply soon. Thanks for the question/feedback!
Hi @nataled - thanks for your patience, and pardon my long response below . . . once I got going I had a lot to say here!
First, regarding 'canonical allele', please ignore this term altogether. It is based on a concept from the ClinGen Allele model, and was added to GENO simply to provide an ontological identifier for this concept to support data model integration. But logically, it should be ignored as it is not yet clear how it relates to other concepts in GENO. The problem is that its cursory logical definition (="variant OR allele") had the unintentional consequence of this class subsuming other core GENO classes in the inferred hierarchy (e.g. allele, gene allele). I have since removed this logical definition to avoid such subsumptions, so please revisit this in GENO. Reasoning yield a childless 'canonical allele' class that you should just ignore for now.
Second, the timing on the 'locus'-related question is wonderful, as I am in the process of clarifying the use of the term 'locus' across GENO. As you may well know, the word "locus" can be problematic due to its varied meaning and use - it can refer to a location in a genome, or to an extent of sequence present at a defined location in the genome. While this may be an acceptable conflation in scientific discourse, the distinction is important when modeling terms in a formal ontology.
In GENO, we had originally used 'locus' in the latter sense above ("an extent of sequence present at a defined location"), but this proved confusing for some users. I have just finished updating all labels in GENO to eliminate this use of 'locus'. For example, I replaced the label 'genomic locus' with 'genomic feature', and the label 'gene locus' with 'gene allele'. Any remaining uses of 'locus' in GENO should now describe a location in the genome, rather than an extent of sequence in the genome. In most cases, I now use 'feature' (sensu Sequence Ontology) rather than 'locus' to refer to extents of sequence identified by their position w.r.t some reference genome.
Given these updates and improvements to GENO, I would recommend you take a fresh look at GENO and revisit the terms you mention above. I hope that you will find things to make more sense now - but happy to chat more if not. Ultimately we want GENO to be clear and usable for a variety of use cases, and are happy to evolve and refine it as needed to be maximally useful.
A final note about GENO is that I have yet to implement a term in GENO representing this clarified concept of a 'genomic locus' as a location in the genome. But I probably should, to be clear and direct about what we mean when we use 'locus' word in the definitions or descriptions of other GENO classes. I will work on this and alert you when it has been implemented. I will likewise define a class for 'gene locus', which again will describe the genomic location where a gene is typically found. This is in contrast to the notion of "the sequence at the location where a gene is typically found" - which we use the term "gene allele" to define. The relationship between a gene allele and gene locus is that the allele "occupies" or "is_located_at" the locus. So in GENO we think of the 'locus' as a genomic address, and the 'allele' as the sequence feature that occupies this address.
I also took a peek at the MRO, and have a couple questions and suggestions for how GENO might align or be used here. First, classes like 'HLA-A locus' seem to refer to what GENO would call the 'HLA-A gene'. There are no definitions for these classes in the MRO, but based on the definition of the 'MHC locus' root class ("region of a chromosome that codes for MHC molecules"), I would surmise that the 'HLA-A locus' class is "the region of the chromosome that codes for the HLA-A protein". This would make it equivalent to what GENO would call the 'HLA-A gene'. So I would advocated for calling these classes 'genes' instead of 'loci'. Seems like this would be internally consistent, align well with the terminology of GENO and SO, and avoid confusion caused by different uses of the term 'locus'. That said, I am not an expert in MHC biology or nomenclature, so there may be a good reason they are using 'locus' here instead of 'gene'.
With this disclaimer in mind, I would make the following recommendations for the MRO:
Hope this helped, and I am happy to provide help or additional feedback on your efforts to align/integrate the MRO and PRO. Our group has done a lot of modeling in these areas across the different OBOs we have contributed to, and are keenly interested in harmonizing representations where possible. This seems like an area where a little it of collaboration could go a long way toward a more interoperable set of ontologies.
@nataled just following up on this to see if you had further questions or requests w.r.t. GENO. Happy to work with you to make GENO address your needs. Thanks!
I have been caught up in a major project that diverted and absorbed my attention for quite some time. I'm also awaiting some feedback on other issues related to MRO which might affect this issue. I hope to get back to this soon.
The Protein Ontology has been tasked with taking over the protein-related terms from MRO (Major Histocompatibility Complex (MHC) restriction ontology). The MRO terms are often defined with respect to a locus, which can include multiple syntenic genes. For example, HLA-A, HLA-B, and HLA-C are all genes located at the HLA locus (human); H2-Q1 through H2-Q15 are all genes located at the H2-Q locus (mouse). A search for the term "locus" (or related) brought me to your term "gene allele" (synonym: "gene locus").
I was pleased to see the term defined as I would expect for a locus--that is (paraphrasing) with respect to position. However, the placement in the GENO hierarchy (viewed after reasoning) is odd, as it appears to be tied (indirectly) to sequence (since alleles are, ultimately, sequence-based). The hierarchy, in contrast to the definition, would make this term equivalent in meaning to "an allele of a gene" (as opposed to, say, an allele of a nucleotide, to use an example from the parent term "allele") and not a locus, per se. It thus appears that the logical definitions (equivalencies) are in conflict with the text definition.
The simple fix, taking the shortest route (so to speak), would be to revise the text definition to reflect the logical one, maintaining its position in the hierarchy. Then, a new term "locus" would be minted using the current text definition of gene allele (however, see notes below).
I would expect to use this new locus term along the lines of the following: