General principle: Use readable IRIs for new annotation properties in OMO

information-artifact-ontology / ontology-metadata

OBO Metadata Ontology

Creative Commons Zero v1.0 Universal

19 stars 8 forks source link

General principle: Use readable IRIs for new annotation properties in OMO #82

Closed matentzn closed 2 years ago

matentzn commented 2 years ago

Whenever we mint new annotation properties APs in OMO, I would suggest we have them readable from now on, i.e. instead of:

OMO:0000100 we mint a new AP like: OMO:obsolesence_reason.

This makes it easier to read annotations even if the entire specification for that annotation is not imported (i.e. OMO.owl)

The ID space for APs is much smaller, much lower risk of conflicts (really none), and the label of an AP does not change easily once coined (never happened in my life time).

cmungall commented 2 years ago

I agree with the principle, but these make awful PURLs.

Not sure I have a great alternative suggestion

Adding a slash between the OMO and the local part will be problematic with generic rules to make prefixes

One radical suggestion is to use w3id PURLs, and even to consider this outside OBO. This should be a generic ontology outside the bio space, conforming to different principles, structures. I think we want buy in from as broad a community as possible and a perceived bio specificity will hinder.

alanruttenberg commented 2 years ago

I don't see any reason why OMO is different than any other ontology. What's special about it that we should contradict our principle of opaque ids? The risk is that we send the wrong message and have other ontologies insist on using english in IDs. I don't see much benefit.

On Thu, Dec 16, 2021 at 9:50 PM Chris Mungall @.***> wrote:

I agree with the principle, but these make awful PURLs.

Not sure I have a great alternative suggestion

Adding a slash between the OMO and the local part will be problematic with generic rules to make prefixes

One radical suggestion is to use w3id PURLs, and even to consider this outside OBO. This should be a generic ontology outside the bio space, conforming to different principles, structures. I think we want buy in from as broad a community as possible and a perceived bio specificity will hinder.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-996392275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDV6NIAD4GJ675EB55LURKQO5ANCNFSM5KF6U56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/996392275@ github.com>

matentzn commented 2 years ago

@cmungall we could use the http://purl.obolibrary.org/obo/omo# which is also widely spread. I am not opposed to minting w3ids, but what if this project discontinues? Its one thing to put a bunch of PURLs on there linking to standards, and another to link IDs through a purl space that is only 80% trustworthy..

@alanruttenberg There are my reasons:

Most people outside of OBO uses readable IDs for properties, i.e. skos, dc, owl and any other standard I can think of. I do not consider a controlled vocabulary of annotation properties the same as an ontology
Using readable identifiers makes reading SPARQL queries, ontology snippets etc much more easy. Imagine rdfs:label being RDFS:000909 etc etc.
None of the risks associated with IDs apply here - there are a handful, they can be denoted by simple strings etc.

I think the burden here is on the side of "using opaque IDs for APs". I think doing this was the wrong call (and I love opaque IDs!), it causes thousands of searches on ontobee just to write one sparql query. I am on the fence for Object and Data properties - currently I am 75% pro opaque, but for annotation properties, I am 90% against (for classes and individuals I am 100% for opaque IDs). I would never encourage anyone to use named identifiers in any ontologies - in fact, the OBO principles protect against that!

EDIT: I do NOT advocate to rename existing APs here!

cmungall commented 2 years ago

I am not so keen on the hash, but http://purl.obolibrary.org/obo/omo/ should be fine

I agree wholeheartedly with your points re opaqueness; I may phrase it slightly differently:

schema / metamodeling entities should be readable (owl:Class, rdfs:label, omo:has_expansion_sparql, ...)
domain entities should be opaque (everything in most OBOs, including RO)

I think there are still arguments to mint IDs outside OBO but I consider this orthogonal so will make a new ticket

alanruttenberg commented 2 years ago

@matentzn I understand what you are saying, but am not convinced. Here's why:

That others have different practices doesn't move me much. Other also use natural language in all their IRIs, and we don't. It's not that I don't care what others are doing - I do care that, to the extent that there are commonly used annotations properties for annotations we would like to make, we use those.
I don't see how having the annotation property IRIs readable but the classes and properties not helps much.
I think that there ought to be tool support for using labels, and it's not difficult to do. The way I implemented it once was to have completion in the SPARQL editor, with completion yielding the addition of a prefix, which was then used in the query. In that scenario, even if label was numeric it wouldn't have made a difference. I built that using YASGUI against GraphDB.

@cmungall unsurprisingly, I'm +1 on slash vs hash.

matentzn commented 2 years ago

If we can agreement on what you mean by "schema, metamodelling entities" @cmungall I am fine. In the end, we can always put it up for vote as well - I am convinced we can reduce OLS/ontobee traffic by 30% if we don't have to check over and over again what some simple, frequently used properties in my sparql queries are (which we edit with text editors, so no tooling here). Any idea @alanruttenberg how we can break this impasse between us? I feel pretty strongly here, and you too, so it would be good to decide on a way to reach a solution that we can both live with. Are you ok if we put it up for a vote?

alanruttenberg commented 2 years ago

You could, but honestly I don't think this rises even to that level. We have an established policy. There is no new information. You haven't explained why having a tiny subset of URLs readable helps generally given all the other IRIs aren't readable, it's an argument based on lack of tooling, never a particularly strong one since we rely quite heavily on tools in almost every area of our work. Finally, there's a simple workaround, which is to keep at hand a clip with a few prefixes like:

PREFIX example_of_usage: http://purl.obolibrary.org/obo/IAO_0000112 ...

This is a slippery slope. Lots of people have a handful of terms they use every day. Why stop here?

matentzn commented 2 years ago

You can use PREFIX for SPARQL queries, but we read triples, stanzas in OBO and OWL snippets all day every day. Knowing what annotations are on a term quickly is very important in my work, with command line and text editors being the only tools involved. In any case, I bet the majority of annotation properties are actually readable... all the rdfs, dc, pav, prov, skos, oboInOwl, and plenty of custom ones are. I think there is a pretty clear line to be drawn here: APs that capture metadata of ontologies and ontology terms. The question here is not so much wether such APs should be allowed (we are using them) - we just need to see wether we can agree to use OMO to register such APs or wether we should do it elsewhere if its not enough in the OBO spirit.

bpeters42 commented 2 years ago

Two things:

We have a very clear voting policy: If we can't reach a consensus, then we vote. Clearly, there is no consensus here. So the next step would be to write up options in a way that they can be voted on.
I would like to hear more about tooling support options - not something that can be done theoretically, but something that could be implemented in a matter of weeks. I would be happy to chip in $ if that helps.

Bjoern

On Mon, Jan 17, 2022 at 1:32 PM Nico Matentzoglu @.***> wrote:

You can use PREFIX for SPARQL queries, but we read triples, stanzas in OBO and OWL snippets all day every day. Knowing what annotations are on a term quickly is very important in my work, with command line and text editors being the only tools involved. In any case, I bet the majority of annotation properties are actually readable... all the rdfs, dc, pav, prov, skos, oboInOwl, and plenty of custom ones are. I think there is a pretty clear line to be drawn here: APs that capture metadata of ontologies and ontology terms. The question here is not so much wether such APs should be allowed (we are using them) - we just need to see wether we can agree to use OMO to register such APs or wether we should do it elsewhere if its not enough in the OBO spirit.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1014900663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IUJHL5FTEQ35VYRZJ3UWSDIDANCNFSM5KF6U56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/1014900663@ github.com>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

alanruttenberg commented 2 years ago

Would it be sufficient to have a command line tool that you can pipe text through with output being what look like readable IRIs, and one that does the reverse? If an in-editor tool is needed, for what editor?

On Mon, Jan 17, 2022 at 8:09 PM bpeters42 @.***> wrote:

Two things:

We have a very clear voting policy: If we can't reach a consensus, then we vote. Clearly, there is no consensus here. So the next step would be to write up options in a way that they can be voted on.

I would like to hear more about tooling support options - not something that can be done theoretically, but something that could be implemented in a matter of weeks. I would be happy to chip in $ if that helps.

Bjoern

On Mon, Jan 17, 2022 at 1:32 PM Nico Matentzoglu @.***> wrote:

You can use PREFIX for SPARQL queries, but we read triples, stanzas in OBO and OWL snippets all day every day. Knowing what annotations are on a term quickly is very important in my work, with command line and text editors being the only tools involved. In any case, I bet the majority of annotation properties are actually readable... all the rdfs, dc, pav, prov, skos, oboInOwl, and plenty of custom ones are. I think there is a pretty clear line to be drawn here: APs that capture metadata of ontologies and ontology terms. The question here is not so much wether such APs should be allowed (we are using them) - we just need to see wether we can agree to use OMO to register such APs or wether we should do it elsewhere if its not enough in the OBO spirit.

— Reply to this email directly, view it on GitHub < https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1014900663 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADJX2IUJHL5FTEQ35VYRZJ3UWSDIDANCNFSM5KF6U56A

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

You are receiving this because you are subscribed to this thread.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/1014900663@ github.com>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1014986143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDXDNV3YMSI65COQSGDUWS4UNANCNFSM5KF6U56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/1014986143@ github.com>

matentzn commented 2 years ago

In my view, no plugin for a single tool would do the trick. We have people using Geany, Atom, Sublime, Notepad++, Vim, eMacs on mac terminal, windows CMD, powershell and many others. We review changes on ontologies as pull requests on GitHub, which means we want to quickly parse the diff (which is textual). Dealing with ontologies as text is very normal to us; independent of any particular editor! And given the fact that an active community with hundreds of users did not come up with a way to cross fund our primary IDE (Protege) in more than 5 years, I doubt we will be able to build bespoke tooling that is suitable for all our use cases. I would prefer to invest Bjoerns desperately needed $ in other places.. But lets not ignore that we agree on so many things - lets just disagree on this, have a vote and live with the consequences :)

alanruttenberg commented 2 years ago

You aren't under any obligation to do this, but if you don't mind, I'd be curious to hear more about the task you are doing. What would interest me is some more details such as:

What is the nature of the work. I'm familiar with you via the issues we've interacted over, but I don't know about your other activities
Is the primary way you are reviewing diffs via Github diffs?
What are you looking for when you are reviewing the annotations?- Are you doing anything other than reviewing annotations?
Do you tend to have the repos of the ontologies you are reviewing locally. Are you able to do the same diffs locally?
You mention using the text editors - are you doing diffs in those? Using editor tools (e.g. ediff in emacs) or eyeballing them?

I do some work in Protege but a lot more by programming, most often using the OWLAPI, but also other tools. To see what it would look like, I wrote a perl script last night as a filter - in: text with IAO ids, out text with the IDs replaced by text. I used robot to prepare the mapping, which is cached, so the filtering doesn't add a delay. When I asked about what kind of tooling you might need it was with a mind that this seems like it could be a relatively simple tool, because it's only text. If so I would just write it myself, no $ involved. I'm writing little tools for myself all the time. I wasn't suggesting a GUI like Protege or a big project, I'm thinking perl, python, bash, etc. CLI. I can't speak to notepad++ but many editors have some programmability. In emacs, for instance, it's easy to write something that takes a region, sends it out to a shell command, and replaces the region with the results, or pops it into a new window.

I frequently use git difftool with Kaleidoscope, and have experimented with using the difftool hook to do a pre-processing before the ontologies get diffed. It's not quite there yet but eventually I will finish it. Some time ago I wrote a filter that I would use for releases that just added comments in line with the RDF/XML that decoded the IRIs. For the BFO-2020 distribution I provide OWL in RDF/XML but also a functional syntax-like version with labels instead of IRIs. Something like this could be used as a hook when committing ontologies. If adopted into the OBO pipeline we wouldn't need to compromise the ID principles.

I've also written a proper ontology diff tool, because I need more than what textual diff offers. I'm not familiar with the Github hooks, but to the extent I review diffs on Github I've been curious whether the diff can be hooked to do manipulations of the sort you would be interested in.

While I take your point about Protege funding, I also see a lot of tooling being built inside OBO, so I don't think we're in a world where we can't rely on tooling support. I know that you are comfortable working in text, but I won't forget the time I was lectured, while in the OWL working group, to NEVER read OWL using as RDF/XML. If you are directly editing RDF/XML, that's very error prone. Not that I don't do it on occasion. I just did a bit of that for an IAO commit because I wanted the diff to be minimal and when Protege saves it can scramble the order, obscuring the important difference. There's another tool waiting to be written - take a new serialization from Protege and an old one, and enforce the old ordering on the new one in order that diffs are minimal.

Finally, a detail: Are you proposing that we change the existing opaque IDs in IAO with human readable ones? Because that would be a rather more radical than just minting new ones readable.

jamesaoverton commented 2 years ago

@alanruttenberg is asking good questions and making good suggestions, but I want to support what @matentzn is saying in comment https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1015296301. The "no tool" option has some major advantages: no installation, no updates/upgrades/incpompatibilies, no platform dependencies, no file size or scaling limits, works anywhere you can read or write text. In addition to all the tools Alan mentions and the ones we've all built for ourselves over the years, ROBOT has various well-supported human-readable conversion and diffing tools. They haven't solved the problem, even for highly technical people. I still often find myself looking up IAO:0000111 over and over.

alanruttenberg commented 2 years ago

@jamesaoverton no question. But there's a balance to be had between that convenience and engineering principles as embodied by our policy. We don't use plastic rivets to build bridges because they are cheaper. We don't vote on whether the bridge should use plastic rivets. There are a variety of reasons we chose opaque IRIs and it seems to me most of them are equally relevant in the case of annotation properties.

All things being equal, if a no tool solution works for you then it's a good solution. But all things are not equal.

The more I think about it, the more I think there's a better solution that will make all the IRIs understandable. Here's a go: Assume each ontology has its imports in its repository. I believe that is the common practice? Have a script to build a text file mapping of IRI to label for each ontology in the repo. Assume it's written to properly cache so the mapping isn't rebuilt unnecessarily. Either on commit, or on release, use a git or github hook to run a script that adds an XML comment beside any line with an opaque IRI, comment contents being the label. It's not a one-liner in perl, but its close. The script can check if the mappings are up-to-date and call on robot to rebuild any that need rebuilding. Build this into the ontology development kit. If you don't want to deal with the hook, then socialize the practice.

Now all the IRIs are easy to interpret, not just the annotation properties.

One of the things about this that gives me pause is that the request is for annotation properties only, yet the bulk of IRIs are opaque. What is it about this task that those aren't being looked at, or at least looked up a lot fewer times than the annotation properties. My thinking then goes to the expectation that it must be a very specific task that isn't the norm, since just by the numbers you would think the lookups of non-annotation IRIs would swamp the lookups of annotation properties. Then, I think why change a principle for a subset of cases. And what prevents the precedent from being invoked on the next task someone wants to do using text, say for the COB classes. There aren't so many of those...

jamesaoverton commented 2 years ago

@alanruttenberg asks: Why should we make an exception for annotation property IRIs?

The pragmatic argument is that you have to read annotation properties IRIs a lot more than you have to read any other IRIs, because they are used as predicates. Here is empirical support for that argument.

I converted OBI to NTriples, broke each triple into three lines (301,317 lines total), then counted unique lines. Here's the top 20:

count	IRI
17888	http://www.w3.org/1999/02/22-rdf-syntax-ns#type
8816	http://www.w3.org/2002/07/owl#Class
8721	http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
8721	http://www.w3.org/1999/02/22-rdf-syntax-ns#first
8135	http://www.w3.org/2002/07/owl#onProperty
8135	http://www.w3.org/2002/07/owl#Restriction
8101	http://www.w3.org/2000/01/rdf-schema#subClassOf
7924	http://www.w3.org/2002/07/owl#someValuesFrom
5122	http://www.w3.org/2000/01/rdf-schema#label
5006	http://purl.obolibrary.org/obo/IAO_0000117
4977	http://purl.obolibrary.org/obo/IAO_0000111
4907	http://purl.obolibrary.org/obo/IAO_0000115
4182	http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
4181	http://purl.obolibrary.org/obo/IAO_0000114
4056	http://www.w3.org/2002/07/owl#intersectionOf
4012	http://purl.obolibrary.org/obo/IAO_0000119
2722	http://purl.obolibrary.org/obo/IAO_0000120
1376	http://purl.obolibrary.org/obo/IAO_0000118
1331	http://purl.obolibrary.org/obo/IAO_0000112
1307	http://purl.obolibrary.org/obo/OBI_0000293
1158	http://purl.obolibrary.org/obo/OBI_0000299
1050	http://purl.obolibrary.org/obo/BFO_0000055

Those IAO IRIs are all annotations properties. Those eight IAO IRIs sum to 28,512, which is almost 10% of the lines in my file, i.e. 10% of all nodes in OBI.

Most of the time, I know what subject I'm looking at because I can see its rdfs:label, and I want to read its annotations. Once I see a blank node, I know I'll probably need a tool to read it, but most of the triples I care about have literal objects. I started with OBI NTriples again, deleted all the triples with blank node subjects (leaving 49,555 lines), then deleted the subjects and objects, leaving only the predicates. Here are the top 20 (there's only about 50):

count	predicate
8099	http://www.w3.org/2000/01/rdf-schema#subClassOf
5433	http://www.w3.org/1999/02/22-rdf-syntax-ns#type
5117	http://www.w3.org/2000/01/rdf-schema#label
4997	http://purl.obolibrary.org/obo/IAO_0000117
4961	http://purl.obolibrary.org/obo/IAO_0000111
4884	http://purl.obolibrary.org/obo/IAO_0000115
4175	http://purl.obolibrary.org/obo/IAO_0000114
4002	http://purl.obolibrary.org/obo/IAO_0000119
1364	http://purl.obolibrary.org/obo/IAO_0000118
1323	http://purl.obolibrary.org/obo/IAO_0000112
933	http://www.w3.org/2002/07/owl#equivalentClass
831	http://purl.obolibrary.org/obo/IAO_0000233
696	http://purl.obolibrary.org/obo/IAO_0000412
517	http://purl.obolibrary.org/obo/OBI_9991118
471	http://purl.obolibrary.org/obo/IAO_0000116
436	http://purl.obolibrary.org/obo/IAO_0000234
274	http://purl.obolibrary.org/obo/IAO_0000232
119	http://www.w3.org/2002/07/owl#deprecated
116	http://purl.obolibrary.org/obo/IAO_0000231
68	http://www.w3.org/2000/01/rdf-schema#isDefinedBy

To be clear: the proposal is not to mint new IRIs for existing IAO terms. But it is clear to me in hindsight that life would have been easier if we had made a few exceptions to the numeric ID policy and used human-readable IRIs for the handful of annotation properties that we use to annotate practically every OBO term.

alanruttenberg commented 2 years ago

Thanks for the stats. I see where you are coming from. It's not quite what I was thinking, but the numbers are big, I'll grant you that. It makes sense that they would be frequent because we use them every term. What I had as the comparison was the interpretation of Axioms, which include the subjects and objects of triples. In all those cases, which do (or should) carry a lot of weight and review, you will only get a label for the class being defined. The rest have to be looked up. Do you not do much review of changes to axioms? I'd be worried if they weren't receiving careful scrutiny as well.

Good news about not wanting to change older IRIs. Do you anticipate that there are going to be a lot of new annotation properties that will be as frequently used as the ones already there? In other words, how much is this really going to benefit you given that presumably the most important annotation properties have already opaque IRIs.

What do you think about my proposal for having a stage in the release pipeline that adds labels in comments universally? That would help you with the already existing IRIs too, which I'm assuming are going to be the bulk of annotations for at least a while, if not forever?

alanruttenberg commented 2 years ago

For the Github diffs, BTW, it occurs to me it wouldn't be that hard to make a javascript bookmarklet that could modify the page in place substituting the labels. It'd have to be updated when new annotation properties are updated, but the javascript could be generated from OMO.

bpeters42 commented 2 years ago

Here is a potentially really stupid idea: what if we create exact synonyms for annotation properties with readable labels? And make clear that those are always intended do be mapped to a numeric ID. And that the label may change in the future, but the changed label would map to the same ID.

The obvious downside is that there will be different versions of the same annotation property floating around. And I dont even know if there is a 'same as' for APs.

On Tue, Jan 18, 2022, 1:25 PM James A. Overton @.***> wrote:

@alanruttenberg https://github.com/alanruttenberg asks: Why should we make an exception for annotation property IRIs?

The pragmatic argument is that you have to read annotation properties IRIs a lot more than you have to read any other IRIs, because they are used as predicates. Here is empirical support for that argument.

I converted OBI to NTriples, broke each triple into three lines (301,317 lines total), then counted unique lines. Here's the top 20: count IRI 17888 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 8816 http://www.w3.org/2002/07/owl#Class 8721 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest 8721 http://www.w3.org/1999/02/22-rdf-syntax-ns#first 8135 http://www.w3.org/2002/07/owl#onProperty 8135 http://www.w3.org/2002/07/owl#Restriction 8101 http://www.w3.org/2000/01/rdf-schema#subClassOf 7924 http://www.w3.org/2002/07/owl#someValuesFrom 5122 http://www.w3.org/2000/01/rdf-schema#label 5006 http://purl.obolibrary.org/obo/IAO_0000117 4977 http://purl.obolibrary.org/obo/IAO_0000111 4907 http://purl.obolibrary.org/obo/IAO_0000115 4182 http://www.w3.org/1999/02/22-rdf-syntax-ns#nil 4181 http://purl.obolibrary.org/obo/IAO_0000114 4056 http://www.w3.org/2002/07/owl#intersectionOf 4012 http://purl.obolibrary.org/obo/IAO_0000119 2722 http://purl.obolibrary.org/obo/IAO_0000120 1376 http://purl.obolibrary.org/obo/IAO_0000118 1331 http://purl.obolibrary.org/obo/IAO_0000112 1307 http://purl.obolibrary.org/obo/OBI_0000293 1158 http://purl.obolibrary.org/obo/OBI_0000299 1050 http://purl.obolibrary.org/obo/BFO_0000055

Those IAO IRIs are all annotations properties. Those eight IAO IRIs sum to 28,512, which is almost 10% of the lines in my file, i.e. 10% of all nodes in OBI.

Most of the time, I know what subject I'm looking at because I can see its rdfs:label, and I want to read its annotations. Once I see a blank node, I know I'll probably need a tool to read it, but most of the triples I care about have literal objects. I started with OBI NTriples again, deleted all the triples with blank node subjects (leaving 49,555 lines), then deleted the subjects and objects, leaving only the predicates. Here are the top 20 (there's only about 50): count predicate 8099 http://www.w3.org/2000/01/rdf-schema#subClassOf 5433 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 5117 http://www.w3.org/2000/01/rdf-schema#label 4997 http://purl.obolibrary.org/obo/IAO_0000117 4961 http://purl.obolibrary.org/obo/IAO_0000111 4884 http://purl.obolibrary.org/obo/IAO_0000115 4175 http://purl.obolibrary.org/obo/IAO_0000114 4002 http://purl.obolibrary.org/obo/IAO_0000119 1364 http://purl.obolibrary.org/obo/IAO_0000118 1323 http://purl.obolibrary.org/obo/IAO_0000112 933 http://www.w3.org/2002/07/owl#equivalentClass 831 http://purl.obolibrary.org/obo/IAO_0000233 696 http://purl.obolibrary.org/obo/IAO_0000412 517 http://purl.obolibrary.org/obo/OBI_9991118 471 http://purl.obolibrary.org/obo/IAO_0000116 436 http://purl.obolibrary.org/obo/IAO_0000234 274 http://purl.obolibrary.org/obo/IAO_0000232 119 http://www.w3.org/2002/07/owl#deprecated 116 http://purl.obolibrary.org/obo/IAO_0000231 68 http://www.w3.org/2000/01/rdf-schema#isDefinedBy

To be clear: the proposal is not to mint new IRIs for existing IAO terms. But it is clear to me in hindsight that life would have been easier if we had made a few exceptions to the numeric ID policy and used human-readable IRIs for the handful of annotation properties that we use to annotate practically every OBO term.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1015846932, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IVZDOPFEBD5TCWBCNTUWXLDJANCNFSM5KF6U56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/1015846932@ github.com>

alanruttenberg commented 2 years ago

I'm happy that you are floating ideas. Thanks!

However, I don't think this will work. First, It wouldn't be sameAs, since these are properties. If the store does OWL inference, the sameAs would just land up punning individuals and not be applicable to the properties. Maybe if it was doing RDFS reasoning, but it would still need to be tested. There's equivalentProperties for object and data properties but I don't see one for annotation properties. It's not unusual in the linked data world to not use much reasoning.

Then there's making sure any use of an ontology in a store also has the mappings. If we included the sameAs in the OMO file and OMO was imported that would work, but it will pollute other uses with the individuals. Moreover, not all linked data uses go to the trouble of loading all the ontologies, particularly for annotations. Try doing a query for documentation on a random Dublin core property, or rdf:Type. You won't generally get anything. OTOH, If the mapping was in a separate file, it won't be uniformly known or remembered and people will have to wonder which version of the property to use.

Finally, at least in some store I used (can't remember which right now) when you query for triples with sameAs, only one of the sameAs set is returned and it's not defined which. So supposing sameAs support worked at all, you could query using property p and see results with property q, assuming p sameAs q. It may be this is the usual behavior because otherwise you potentially land up with lots of triples that mean the same thing in the result, and that would probably confuse.

alanruttenberg commented 2 years ago

In case this is useful, there is a Firefox extension called Text rewriter that will let you configure any number of rewrites of text strings on a web page. Below is a perl program that will generate configuration for it for a given OBO ontology. Assume you name it labels.pl and robot is on your PATH, running perl labels.pl omo json will write a configuration which rewrites all OMO ids to labels. perl labels.pl omo refresh will re-download OMO so next time you ask for the json it will be up-to-date. The results of running this for OMO are below the script. Files are cached in ~/.cache/ontology-labels/

To configure the extension right click on the extension icon (after installed) and paste the configuration and save it. text-rewriter-configuration

Thereafter the IDs will be rewritten when you view any page, such as this Github diff

my $ont = $ARGV[0];
my $command = $ARGV[1];

my $cache=glob("~/.cache/ontology-labels");
system("mkdir -p $cache");
my $ontpath="$cache/$ont.owl";
my $labelpath = "$cache/$ont-labels.txt";
my $querypath = "$cache/labels-query.sparql";
my $robot = `which robot`;
chop $robot;
if (! -e $robot)
  { die ("Must have robot installed or on \$PATH");
  }

if ($command eq "refresh")
  { unlink $ontpath;
    unlink $labelpath;
  }
unless (-e $ontpath)
  { print STDERR "Downloading $ont.owl...";
    my $downloadcom = "curl -s -L http://purl.obolibrary.org/obo/$ont.owl > $cache/$ont.owl";
    print STDERR "$downloadcom\n";
    system($downloadcom);
    if (! -e "$cache/$ont.owl")
      { die("failed to fetch $ont.owl") }
    print "done.\n"
  }

unless (-e $labelpath)
  { print STDERR "Running robot to get labels...";
    open($queryh, ">", $querypath);
    print $queryh "select ?s ?l where {?s <http://www.w3.org/2000/01/rdf-schema#label> ?l}\n";
    close($queryh);
    my $command = "$robot query --input $ontpath --query $querypath $labelpath";
    print STDERR "$command\n";
    system($command) ;
  }

if ($command eq "json")
  { open($labelh, "<", $labelpath) or die("Couldn't open label path");
    my $line = <$labelh>;
    print "[\n";
    while( my $line = <$labelh>)  {
      chop $line;
      $line =~ /http:\/\/purl\.obolibrary\.org\/obo\/([^,]*),(.*)/;
      ($id, $s) = ($1, $2);
      chop($s);
      $s =~ s/ /_/g;
      print sprintf("\{\"from\":\"%s\",\"to\":\"%s\",\"ic\":false,\"mw\":false,\"sc\":false\}\n",$id,$s);
    }
    print "]\n";
  }

[
{"from":"IAO_0000231","to":"has_obsolescence_reason","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000017","to":"ontology_module_subsetted_by_expressivity","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000119","to":"definition_source","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000424","to":"expand_expression_to","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000027","to":"data_item","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000112","to":"example_of_usage","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000589","to":"OBO_foundry_unique_label","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000002","to":"editors_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0100001","to":"term_replaced_by","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000102","to":"data_about_an_ontology_part","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000598","to":"has_ID_policy_for","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000122","to":"ready_for_release","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000410","to":"universal","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000124","to":"uncurated","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000420","to":"defined_class","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000125","to":"pending_final_vetting","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000004","to":"bridge_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000123","to":"metadata_incomplete","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000597","to":"has_ID_range_allocated_to","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000233","to":"term_tracker_item","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000008","to":"analysis_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000116","to":"editor_note","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000000","to":"ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000421","to":"named_class_expression","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000426","to":"first_order_logic_expression","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000229","to":"term_split","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000103","to":"failed_exploratory_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000020","to":"EL++_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000019","to":"ontology_module_subsetted_by_OWL_profile","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000018","to":"obo_basic_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000013","to":"reasoned_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000227","to":"terms_merged","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000428","to":"requires_discussion","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000002","to":"example_to_be_eventually_removed","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000012","to":"species_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000005","to":"import_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000118","to":"alternative_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_0010000","to":"has_axiom_label","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000602","to":"has_associated_axiom(fol)","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000600","to":"elucidation","ic":false,"mw":false,"sc":false}
{"from":"IAO_0006011","to":"may_be_identical_to","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000111","to":"editor_preferred_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000234","to":"ontology_term_requester","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000411","to":"is_denotator_type","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000001","to":"base_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000115","to":"definition","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000228","to":"term_imported","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000423","to":"to_be_replaced_with_external_ontology_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000016","to":"taxonomic_bridge_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000114","to":"has_curation_status","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000030","to":"information_content_entity","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000232","to":"curator_note","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000599","to":"has_ID_prefix","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000603","to":"is_allocated_id_range","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000009","to":"single_layer_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000113","to":"in_branch","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000225","to":"obsolescence_reason_specification","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000425","to":"expand_assertion_to","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000011","to":"external_import_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000078","to":"curation_status_specification","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000015","to":"template_generated_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000409","to":"denotator_type","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000427","to":"antisymmetric_property","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000117","to":"term_editor","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000006","to":"subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000121","to":"organizational_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000003","to":"main_release_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000224","to":"obsolete_core","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000007","to":"curation_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000010","to":"exclusion_subset_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000700","to":"has_ontology_root_term","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000412","to":"imported_from","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000120","to":"metadata_complete","ic":false,"mw":false,"sc":false}
{"from":"IAO_8000014","to":"generated_ontology_module","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000226","to":"placeholder_removed","ic":false,"mw":false,"sc":false}
{"from":"IAO_0006012","to":"scheduled_for_obsoletion_on_or_after","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000601","to":"has_associated_axiom(nl)","ic":false,"mw":false,"sc":false}
{"from":"IAO_0000596","to":"has_ID_digit_count","ic":false,"mw":false,"sc":false}
]

matentzn commented 2 years ago

VOTE

We didn't get to this point during the last call, but given this piece of (friendly :P, yet) irreconcilable disagreement between @alanruttenberg (on this list) and me, and the fact that this blocks about 5 issues we really need to deal with, can I get some support for offering this up for a vote (just the permission of even calling a vote on this)?

If we cannot get a shared agreement (which is fine!), we will create a new vocabulary outside of OMO for the APs we need, and we will import these into OMO, just the way we do with DC, SKOS, etc properties.

I am definitely fine with being overvoted, i.e. if more people vote against readable identifiers, I will cede to this decision, obviously! I just do not want to give up something I feel I am right on.

👍 : Let's have a vote. If I happen to favour the minority, I will happily accept that result. 👎 : This goes too far against fundamental OBO design principles, and a vote should not be called without a major discussion on the issue first.

alanruttenberg commented 2 years ago

There is not a vote between alternatives. It's a vote between whether you'll do it one way or do it another way.

alanruttenberg commented 2 years ago

Let me clarify: I thought the vote would be to decide whether we add new annotation properties with either a) interpretable or b) opaque IRIs. As I understand this proposal, the choices offered are not whether or not there will be annotation properties with interpretable IRIs in OMO, but instead how annotation properties with interpretable IRIs will find their way into OMO. The alternatives as I read there were a) Interpretable IRI annotations will be added directly to OMO, or b) Another ontology will be built with interpretable IRIs, and that will be imported into OMO. So it looks like both options yield the same result.

matentzn commented 2 years ago

Ah no, sorry, I created this additional confusing vote to accommodate for your comment:

We don't use plastic rivets to build bridges because they are cheaper. We don't vote on whether the bridge should use plastic rivets. There are a variety of reasons we chose opaque IRIs and it seems to me most of them are equally relevant in the case of annotation properties.

I interpreted this as you directly opposing a vote on the matter of opaque vs interpretable. So basically I wanted to know from the obo community (i.e all of us): is this an issue we should even vote on? Hence I called for the first vote. If positive, the next vote will be about deciding a) interpretable or b) opaque IRIs. So let us keep it extremely simple:

Do yo oppose a vote on the matter?

hlapp commented 2 years ago

I think it's fair to question whether a point of contention in science should best be addressed by popular contest, or by reasoned debate until a solution is found that everyone of those with strong opinions can accept (happily or not). That doesn't mean that there can't be a situation where some solution one way or another is needed to resolve an infrastructure stalemate. It seems like @matentzn is suggesting this is such a situation? I.e., the end of what reasoned debate can achieve has been reached?

There's a little bit of an internal contradiction with requesting a vote on whether to vote, and assuming a "yay" implies If I happen to favour the minority, I will happily accept that result. Because if that were really so, wouldn't that imply that you can already live with either of the contending solutions? That is, the reason we would then be called upon to vote is not that you couldn't accept and live with one of the solutions, but that you think (1) your proposal is the right one; and (2) a majority of stakeholders think the same way as you, presumably corroborating your sense that your proposal is the right one.

I don't oppose a vote. But good points have been raised on both sides, including about issues with the other side. I'm unconvinced that the end of reasoned debate has been reached. More specifically, I don't see how one side emerging more popular than the other is going to resolve the weaknesses that have been argued about them, and so I'm not sure how those with a strong opinion on either side can end up happy if they end up in the popular minority, because their issues will essentially remain unsolved.

Having said all this, I don't have a dog in this fight. I can see the arguments on both sides, and have run into issues with both. AFAIAC I don't feel strongly enough about one solution or the other to be motivated to vote. I'm just rather skeptical about the utility of popular contests to properly solve problems in scientific infrastructure.

bpeters42 commented 2 years ago

The OBO decision-making process is to first have a reasoned debate to try to find consensus first. I think that was done here. If it is decided (as here) that there is no consensus, any of the involved parties can (reluctantly) call for a vote (aka popularity contest). Not because that is a perfect way to find the right solution, but because there are apparently multiple different viewpoints, and we want to move beyond the problem at hand.

So I don't think we should have a vote if we should have a vote - that should be a given if it is being called for after a reasonable time of debate.

On Fri, Feb 25, 2022 at 9:18 AM Hilmar Lapp @.***> wrote:

I think it's fair to question whether a point of contention in science should best be addressed by popular contest, or by reasoned debate until a solution is found that everyone of those with strong opinions can accept (happily or not). That doesn't mean that there can't be a situation where some solution one way or another is needed to resolve an infrastructure stalemate. It seems like @matentzn https://github.com/matentzn is suggesting this is such a situation? I.e., the end of what reasoned debate can achieve has been reached?

There's a little bit of an internal contradiction with requesting a vote on whether to vote, and assuming a "yay" implies If I happen to favour the minority, I will happily accept that result. Because if that were really so, wouldn't that imply that you can already live with either of the contending solutions? That is, the reason we would then be called upon to vote is not that you couldn't accept and live with one of the solutions, but that you think (1) your proposal is the right one; and (2) a majority of stakeholders think the same way as you, presumably corroborating your sense that your proposal is the right one.

I don't oppose a vote. But good points have been raised on both sides, including about issues with the other side. I'm unconvinced that the end of reasoned debate has been reached. More specifically, I don't see how one side emerging more popular than the other is going to resolve the weaknesses that have been argued about them, and so I'm not sure how those with a strong opinion on either side can end up happy if they end up in the popular minority, because their issues will essentially remain unsolved.

Having said all this, I don't have a dog in this fight. I can see the arguments on both sides, and have run into issues with both. AFAIAC I don't feel strongly enough about one solution or the other to be motivated to vote. I'm just rather skeptical about the utility of popular contests to properly solve problems in scientific infrastructure.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/82#issuecomment-1051041717, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2ITYPXEFSOIDQSM26ZTU462XTANCNFSM5KF6U56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: <information-artifact-ontology/ontology-metadata/issues/82/1051041717@ github.com>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

matentzn commented 2 years ago

There's a little bit of an internal contradiction with requesting a vote on whether to vote, and assuming a "yay" implies If I happen to favour the minority, I will happily accept that result.

I forgot to say that I the vote on the voye has to be unanimous, not just the majority.. Good catch @hlapp :)

@bpeters42 Thank you. I will share the final actual vote shortly.

alanruttenberg commented 2 years ago

Do you oppose a vote on the matter?

I did comment earlier on that I wasn't sure this something that should be voted on, but after Bjoern's response I accepted that there would be a vote and am not opposed to that. What confused me were the particulars of the vote proposed for the reasons I explained. I don't understand why there would be two votes.

cmungall commented 2 years ago

I wanted to follow up on @alanruttenberg's point here:

But there's a balance to be had between that convenience and engineering principles as embodied by our policy. We don't use plastic rivets to build bridges because they are cheaper. We don't vote on whether the bridge should use plastic rivets. There are a variety of reasons we chose opaque IRIs and it seems to me most of them are equally relevant in the case of annotation properties.

I have not gotten around to articulating a response to Alan, so in the spirit of exhausting all reasonable debate as per @hlapp's point, I will try.

First, I don't buy the plastic rivets + bridge analogy. We all want to make robust engineered artefacts that last. If we are to follow through with the analogy, then we are the ones driving over this bridge every day, and are pretty invested in making it strong and safe.

Why do we want opaque for domain ontologies yet readable for ontology metadata? It's a reasonable question, and if we were to be following dogmatic principles then we would not want to make exceptions, but I think there are strong scientific, pragmatic, and engineering-based reasons to make a distinction:

First, I think we are all in agreement that opaque IDs are required for scientific domain ontologies, but I will iterate what I think are the key reasons, as I think this should be grounded in empirical and pragmatic reality and not philosophy or dogma:

Scientific terminology is in flux. The primary labels for concepts in ontologies change all the time. If anyone is in doubt about this, it should be possible to come up with the data here, e.g. through bioportal diffs. For GO you can look at the QuickGO history for any term.
Furthermore, it changes in ways that would make non-opaque IDs highly annotation-error prone. For example, over time labels become more general or more specific than the underlying ontology concept.
Also, it changes in ways we as ontologists cannot control. I would love the scientific community to use some of the systematic names we come up with for ontology classes, but they don't, and we must live with this
Finally, the scale of this is massive. Many ontologies have tens of thousands of highly interlinked concepts with many people contributing. With non-opaque IDs, the chance we would choose an ID we would regret later is very high, and it will happen frequently.

Most of us could easily pull up multiple examples where opaque IDs have saved us, or come up with plausible scenarios where they will save us in the future (for domain ontologies).

None of these apply to small vocabularies of terms used to provide metadata in our ontologies, i.e. OMO

The terminology is not in flux. The primary labels for the set of properties proposed for OMO has been very stable and is likely to remain so. Again it should be very easy to provide evidence for this by looking at rdfs:label changes in OMO.
Even if the OMO terminology was in flux, it has not changed and is unlikely to change in ways that would later lead to incorrect annotation of scientific data. In fact we already make use of an ontology metadata terminology that is confusing. "label" means something different to a ML researcher, IAO:0000115 should really be "text definition" since "definition" is a broader term encompassing logical definitions. rdfs:isDefinedBy has a fairly different meaning than the IRI. This has not led to incorrect usage of these terms, in part because the people who annotate them directly in Protege are experts in the usage of these terms, and frequently people use tools that hide all of this.
In contrast to terminology for, say, brain regions, or organisms or diseases: we are largely in control of the terminology for OMO so we can keep it in sync with the concepts. This is because it is a highly specialized terminology devised and used by us.
The scale of OMO is small. For a large scientific ontology with 10k concepts, it is highly likely that initial choices of human-readable primary labels will turn out to be poor, hence the need for an opaque ID buffer. In contrast, for OMO, we have a handful of terms. If we mint non-opaque IDs we can dedicate the time and the substantial expertise of this group to select names that are likely to stand the test of time. Will be do this absolutely perfectly? No, but we don't need to, see 1-3.

To further emphasize these points we need look no further than the most commonly used property in OMO: rdfs:label. If we were truly following our own principles all along, we would have given this an opaque ID all along, and might have cycled though a succession of... errm, labels. "label" is very confusing outside our rdf bubble, since data scientists and databases like neo4j use this term to essentially mean "category". Despite this potential confusion, and use of a dreaded non-opaque ID for every single one of our classes and object properties, I am not aware of a single case where this has caused a problem that would have been solved by opaque IDs.

We can also play out some scenarios. Let's say that my evaluation is overly optimistic, and we choose a non-opaque ID for a new OMO concept that turns out to be so egregiously bad or confusing we need to change it in the future. Here we can simply obsolete the ID and make a new one! Does this have a cost? For sure. Can we estimate that cost? I think we can, based on our decades of shared experience. And I think we would agree that outside of a minimal core set of OMO terms (whose IRIs we are not proposing to change here -- this is just for new terms) the cost is relatively low, and the cost will be born by the ones arguing for the non-opaque IDs.

I won't restate the excellent arguments already stated about the high cost of opaque OMO IRIs (which again apply specifically to OMO and not domain ontologies), I just wanted to address the concern that we were somehow making a tradeoff between convenience and good engineering. Those of us arguing for non-opaque OMO IRIs care deeply about the engineering and making things that last, and we bear the brunt of costs for any poor decisions we make. We are not doing this out of pure convenience but based on strong evidence-based reasoning backed by data and years of engineering experience.

matentzn commented 2 years ago

Final Vote: Opaque vs Readable Ids for ontology metadata properties in OMO

Hello everybody, we have had a detailed and interesting discussion and the pro's and con's of using readable IRI remainders. Concretely, the problem is whether OMO should permit, for example, http://purl.obolibrary.org/obo/omo/abbreviation as a property, or whether we want to stick with the current practice of using opaque IDs http://purl.obolibrary.org/obo/OMO_0001000.

🎉 Use opaque ids (http://purl.obolibrary.org/obo/OMO_0001000). This is current practice in OBO. The key advantages are: its is language agnostic and therefore more inclusive. We can flexibly change "the label" as long as we preserve the meaning.
👍 Use readable ids (http://purl.obolibrary.org/obo/omo/abbreviation). This is prevalent practice in semantic web vocabs (dc, skos, etc). It makes reading ontology snippets easier and avoids look up or use of specific tools.
👀 Abstain. I have followed the discussion but I am either unsure what is best, or do not wish to take part in the vote.

Both choices have strong arguments either way, and all arguments are relative as you mitigate downsides with tool support always.

The voting closes on Friday the 5th of March (all time zones). As per usual practice, we will go with the simple minority, i.e. if we have 5:4 votes, the option with 5 votes wins (no consensus required). Please share this poll with other folks that have a stake in this decision!

matentzn commented 2 years ago

Important: The discussion is closed - if you want to make another point related to this issue (other than about the phraseing of the vote), please open a new issue! Thank you for your understanding - we are a bit blocked on a number of tickets and need to move on from this.

matentzn commented 2 years ago

Update: new option to abstain from vote for those that follow the discussion but are unsure what is best. Abstain votes won't influence the result of the vote, but will help us design future decision making processes. Thank you @bpeters42 for the suggestion.

balhoff commented 2 years ago

I'm going to abstain because I came too late to say that I would have liked to first see a plan for how the non-opaque terms will be resolved by the PURL infrastructure, and what the recommended prefix would be. It would be inconsistent and confusing for OMO: to expand to anything but http://purl.obolibrary.org/obo/OMO_, since it's registered in OBO. Maybe the term resolution issue is easily solved, but I would like to know that it will be taken care of.

matentzn commented 2 years ago

Fair point, in case the vote passes, I made an issue to walk that mountain: https://github.com/information-artifact-ontology/ontology-metadata/issues/89

zhengj2007 commented 2 years ago

@matentzn I think there may be more people interested in the topic but not aware of the vote. How about reminder the people at least those who are interested in attending the OMO workshop to participate the vote? Thanks!

matentzn commented 2 years ago

Yes, please share this poll with whoever you feel would have an opinion. I shared it in my circles!

zhengj2007 commented 2 years ago

I'd prefer opaque ids since the AP labels may be changed. For example, IAO:'definition' originally labelled as 'textual definition', IAO:'term editor' originally labelled as 'definition editor'.

matentzn commented 2 years ago

The vote ended 6:6 which reflects the fierceness of the above debate. Since this issue has been dragged long enough, I will drop the motion for readable IRIs for APs now, and register the next APs as numeric. Thanks for the discussion all!