zhengj2007 / bfo-trunk

0 stars 0 forks source link

Name properties uniformly - with a verb #30

Open zhengj2007 opened 9 years ago

zhengj2007 commented 9 years ago

From alanruttenberg@gmail.com on May 20, 2012 12:14:53

e.g. has part ok part of -> is part of

--- Conversation with Barry

On Sat, May 19, 2012 at 4:53 PM, Alan Ruttenberg alanruttenberg@gmail.com wrote: I want to do some edits related to annotations and don't want to merge if I can help it.

In the discussion group we agreed on having relation names be uniform

Good idea

So:

has part (ok) part of -> is part of

etc.

Can I make this change in the reference?

I would be very happy for you to do this when I hand it back to you

Original issue: http://code.google.com/p/bfo/issues/detail?id=31

zhengj2007 commented 9 years ago

From dosu...@gmail.com on May 20, 2012 12:24:53

I have to say - I quite like the relations without 'is' and with underscores, as this serves to mark them out clearly from plain English terms that have the same name and makes the structure of Manchester Syntax statements clearer.

I assume that the argument for adding and initial 'is', is that is makes OBO relationships and OWL MS look more like English (if you ignore the odd quoting):

e.g. finger 'is part of' some hand is a bit more like English than finger 'part of' some hand

But sometimes the 'is' makes DL queries less readable. Say I want to write a DL query that refers to anonymous class without specifying a genus:

Arguably: 'is located in' some ('is part of' some skull) is less readable than 'is located in' some ('part of' some skull)

And if what we want to get across that finger 'part of' some hand means all finger(s) are 'part of' some hand - then perhaps the version lacking the 'is' is less misleading?

This is even more apparent when referring to relations in free text. Compare:

"finger 'is part of' some hand means all finger(s) are 'is part of' some hand" to "finger part_of some hand means that all fingers are part_of some hand"

Really, the only way to make the former readable is to abandon using the relation name altogether and rely on the plain English equivalent:

"finger 'is part of' some hand means all finger(s) are part of some hand"

This is probably OK with 'part of' but is likely to be misleading in other cases, given that relations often have much more specialised meanings than the regular English words and phrases used to name them.

zhengj2007 commented 9 years ago

From alanruttenberg@gmail.com on May 20, 2012 13:04:03

The motivation was first that there be a uniform way we write relations. Since our current set of relations are a mix of grammatical forms the proposal was to make this uniform. In addition, as you point out, there may be issues about how labels are read in english. However it seems these issues cut both ways, and that adopting a consistent way of choosing the names, will, in the long run, make it easier for users to understand what they see. By users, I mean developers, btw. If you want to present ontologies to end users you will (always) have to do more work to ensure that the results are colloquially understandable.

BTW, Barry, in private communication, asks why we don't use underscores in relation names. I answered him by quoting his paper

"Survey-based naming conventions for use in OBO Foundry ontology development"

Naming Conventions

Our proposed set of naming conventions, founded on the survey results, is summarized in Table 1. In further discus- sions, we refer to the entities of which an ontology consists (in some circles these are called classes and relations) as its representational units [19]. A representational unit can be accompanied by one or more synonymous names of different categories. Any type of name that is chosen to be displayed in the hierarchy is called 'display name' (called 'browser key' in Protégé). Where the form of that name is controlled by a set of explicit rules we refer to it as a 'formal name'. To ensure that the conventions proposed here are expressed unambiguously we employ the following additional name categories, which we hope will also have general utility: ... 3.3 Use the bar space (' ') character as word separator, just as it would normally appear in the language of choice. Where use of the bar space is not allowed by the type of representational unit in use to store a name, the underscore ('_') should be used instead. Camel case should not be used as a means of word separation.

zhengj2007 commented 9 years ago

From mcour...@gmail.com on May 20, 2012 14:40:54

It may be worth noting that this paper is dated 2009, and the survey itself was conducted among 66 people in 2007. It also refers to things that have evolved, such as Protege 3's way of displaying classes by rdf:ID. In past discussions with other heavy developers, and in my own experience, not using underscore is a pain, especially considering the auto complete feature (for which you Alansubmitted another tracks suggesting to add a new annotation property) As mentioned, there will anyway be extra work to provide a nice human friendly user name, so why not make things easier on the developer and just use underscores? No need for yet an extra duplication of the label as yet a new annotation, and no need for a new way to handle things in Protege. I also agree with David that it makes things much more readable, whether in Manchester syntax, papers or general written communication.

zhengj2007 commented 9 years ago

From haen...@ohsu.edu on May 20, 2012 14:52:30

I agree with Melanie, I personally greatly prefer the underscores in relations as I find it easier in various autocomplete functions and less confusing for new users in distinguishing classes from properties.

zhengj2007 commented 9 years ago

From cmung...@gmail.com on May 20, 2012 14:57:59

+1 for using underscores and dropping the "is".

Another argument for doing it this way is precedence - I know we shouldn't be bound by legacy, but "part_of" is the de-facto standard label in dozens of ontologies some of which have been around >10 years. This is the form that has been published in however many papers, including the 2005 OBO-Relations paper.

zhengj2007 commented 9 years ago

From alanruttenberg@gmail.com on May 20, 2012 18:19:28

I don't see why all these issues can't be resolved with the proposal I put forth in issue #32 . I think it sets a bad example to stick to what is effectively jargon, and to be ruled by one authoring tool that could do its job better (google knows how to complete terms with spaces without messing with quotes, for example). However if the overall sentiment on this is consistent with the few comments so far, I will instead propose that we add a new alternative term - something like 'natural language string' and have that label be the one with consistent and proper english labels.

Again, I see this as intrusion of user interface into ontology best practices. No english speaker uses underscores in their usual language, and most try to use verbs consistently. The completion software can (and should) be fixed to make your lives easy. Here are some ways

1) stop insisting on quotes for completion. Google manages to get away with a single character in g+, and then de-emphasizes it typographically when you are finished completing.

2) Have an option where protege understands during completion that when typing an underscore it should be considered a space if there is no term that has an underscore and there is one with a space.

3) Have protege be more generous with completion choices - offering matches in the middle of terms below matches from the front, and offering recently chosen completions at the top of the list when they match, and match on first letters of words, so that ipo or po (matching from the middle) both offer is part of.

4) Make it supereasy to add abbreviations - see e.g. http://www.typeit4me.com/ . So someone should be able to easily say "I want you to offer me is part of every time I'm in a relation completion context and type an initial p."

Cc: matthew....@googlemail.com

zhengj2007 commented 9 years ago

From mcour...@gmail.com on May 20, 2012 20:16:49

In this case the discussion is not so much about what Protege supports or not, but what people working on and with those resources feel comfortable with. Several of us are of the opinion that we should be using underscores, for reason of readability among others. The countergument was the paper by Schober et al., which is based on a 5 years old survey, and may not reflect tool and user reality anymore. The second counter argument was that it will be easier for the end user to read; but at the same time comment #2 above justifies keeping "is" in relation names by saying "it's ok, we'll anyway need to preprocess for end user", so it seems like that point is moot. Would one option be to poll the BFO community and decide based on preference?

zhengj2007 commented 9 years ago

From alanruttenberg@gmail.com on May 20, 2012 20:56:51

re: "Would one option be to poll the BFO community and decide based on preference?" - please file procedural issues separately as Type-BFO2-Process.

Re: other comments, seems like we are going in circles. The question isn't whether, but how. One proposal has uniformity in editor preferred label and legacy in an alternative term, and the other the other way around.

zhengj2007 commented 9 years ago

From mcour...@gmail.com on May 20, 2012 21:30:51

Procedural issue created at https://code.google.com/p/bfo/issues/detail?id=34 Re circles, both options are not equivalent. I was suggesting we use rdfs:label with value part_of, while you are suggesting to have an rdfs:label with value is part of, a new annotation property with value part_of, and/or require some development from the protege team. If you disagree, it would be helpful if you could provide an example of both cases as you see them, with all label related annotation properties, to make sure we work with the same basis.

zhengj2007 commented 9 years ago

From dosu...@gmail.com on May 21, 2012 09:21:31

This issue has nothing to do with Protege. I'm arguing that the current typographical convention is good because it helps distinguish relations from English words/phrases with the same spelling. The 'is' makes some Manchester Syntax expressions less readable and the also makes at least some references to relations in free text extremely clunky.

A poll of OBO foundry + BFO folks seems a reasonable way to resolve if we can't resolve here.

zhengj2007 commented 9 years ago

From steschu@gmail.com on May 25, 2012 06:48:28

+1 for using underscores neutral regarding the use of "is"

However, this discussions should not delay any further the release of a BFO2OWL trial version. Once such a trial version can be tested by the user community,we will have a more complete picture about the range of opinions and can take a more informed decision.