zhengj2007 / bfo-export

Automatically exported from code.google.com/p/bfo
0 stars 0 forks source link

Name properties uniformly - with a verb #31

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
e.g. has part ok
part of -> is part of

--- Conversation with Barry

On Sat, May 19, 2012 at 4:53 PM, Alan Ruttenberg <alanruttenberg@gmail.com> 
wrote:
I want to do some edits related to annotations and don't want to merge
if I can help it.

In the discussion group we agreed on having relation names be uniform
- always including a verb.

Good idea 

So:

has part (ok)
part of -> is part of

etc.

Can I make this change in the reference?

I would be very happy for you to do this when I hand it back to you

Original issue reported on code.google.com by alanruttenberg@gmail.com on 20 May 2012 at 4:14

GoogleCodeExporter commented 9 years ago
I have to say - I quite like the relations without 'is' and with underscores, 
as this serves to mark them out clearly from plain English terms that have the 
same name and makes the structure of Manchester Syntax statements clearer. 

I assume that the argument for adding and initial 'is', is that is makes OBO 
relationships and OWL MS look more like English (if you ignore the odd quoting):

e.g. 
finger 'is part of' some hand
is a bit more like English than
finger 'part of' some hand

But sometimes the 'is' makes DL queries less readable.  Say I want to write a 
DL query that refers to anonymous class without specifying a genus:

Arguably: 
'is located in' some ('is part of' some skull)
is less readable than
'is located in' some ('part of' some skull)

And if what we want to get across that
finger 'part of' some hand means
all finger(s) are 'part of' some hand - then perhaps the version lacking the 
'is' is less misleading?

This is even more apparent when referring to relations in free text. Compare:

"finger 'is part of' some hand means all finger(s) are 'is part of' some hand"
to
"finger part_of some hand means that all fingers are part_of some hand"

Really, the only way to make the former readable is to abandon using the 
relation name altogether and rely on the plain English equivalent:

"finger 'is part of' some hand means all finger(s) are part of some hand"

This is probably OK with 'part of' but is likely to be misleading in other 
cases, given that relations often have much more specialised meanings than the 
regular English words and phrases used to name them. 

Original comment by dosu...@gmail.com on 20 May 2012 at 7:24

GoogleCodeExporter commented 9 years ago
The motivation was first that there be a uniform way we write relations. Since 
our current set of relations are a mix of grammatical forms the proposal was to 
make this uniform. In addition, as you point out, there may be issues about how 
labels are read in english. However it seems these issues cut both ways, and 
that adopting a consistent way of choosing the names, will, in the long run, 
make it easier for users to understand what they see. By users, I mean 
developers, btw. If you want to present ontologies to end users you will 
(always) have to do more work to ensure that the results are colloquially 
understandable.

BTW, Barry, in private communication, asks why we don't use underscores in 
relation names. I answered him by quoting his paper 

"Survey-based naming conventions for use in OBO Foundry ontology development"

Naming Conventions

Our proposed set of naming conventions, founded on the survey results, is 
summarized in Table 1. In further discus- sions, we refer to the entities of 
which an ontology consists (in some circles these are called classes and 
relations) as its representational units [19]. A representational unit can be 
accompanied by one or more synonymous names of different categories. Any type 
of name that is chosen to be displayed in the hierarchy is called 'display 
name' (called 'browser key' in Protégé). Where the form of that name is 
controlled by a set of explicit rules we refer to it as a 'formal name'. To 
ensure that the conventions proposed here are expressed unambiguously we employ 
the following additional name categories, which we hope will also have general 
utility:
...
3.3 Use the bar space (' ') character as word separator, just as it would 
normally appear in the language of choice. Where use of the bar space is not 
allowed by the type of representational unit in use to store a name, the 
underscore ('_') should be used instead. Camel case should not be used as a 
means of word separation.

Original comment by alanruttenberg@gmail.com on 20 May 2012 at 8:04

GoogleCodeExporter commented 9 years ago
It may be worth noting that this paper is dated 2009, and the survey itself was 
conducted among 66 people in 2007. It also refers to things that have evolved, 
such as Protege 3's way of displaying classes by rdf:ID.
In past discussions with other heavy developers, and in my own experience, not 
using underscore is a pain, especially considering the auto complete feature 
(for which you Alansubmitted another tracks suggesting to add a new annotation 
property) As mentioned, there will anyway be extra work to provide a nice human 
friendly user name, so why not make things easier on the developer and just use 
underscores? No need for yet an extra duplication of the label as yet a new 
annotation, and no need for a new way to handle things in Protege. I also agree 
with David that it makes things much more readable, whether in Manchester 
syntax, papers or general written communication.

Original comment by mcour...@gmail.com on 20 May 2012 at 9:40

GoogleCodeExporter commented 9 years ago
I agree with Melanie, I personally greatly prefer the underscores in relations 
as I find it easier in various autocomplete functions and less confusing for 
new users in distinguishing classes from properties. 

Original comment by haen...@ohsu.edu on 20 May 2012 at 9:52

GoogleCodeExporter commented 9 years ago
+1 for using underscores and dropping the "is".

Another argument for doing it this way is precedence - I know we shouldn't be 
bound by legacy, but "part_of" is the de-facto standard label in dozens of 
ontologies some of which have been around >10 years. This is the form that has 
been published in however many papers, including the 2005 OBO-Relations paper.

Original comment by cmung...@gmail.com on 20 May 2012 at 9:57

GoogleCodeExporter commented 9 years ago
I don't see why all these issues can't be resolved with the proposal I put 
forth in issue 32. I think it sets a bad example to stick to what is 
effectively jargon, and to be ruled by one authoring tool that could do its job 
better (google knows how to complete terms with spaces without messing with 
quotes, for example). However if the overall sentiment on this is consistent 
with the few comments so far, I will instead propose that we add a new 
alternative term - something like 'natural language string' and have that label 
be the one with consistent and proper english labels. 

Again, I see this as intrusion of user interface into ontology best practices. 
No english speaker uses underscores in their usual language, and most try to 
use verbs consistently. The completion software can (and should) be fixed to 
make your lives easy. Here are some ways

1) stop insisting on quotes for completion. Google manages to get away with a 
single character in g+, and then de-emphasizes it typographically when you are 
finished completing. 

2) Have an option where protege understands during completion that when typing 
an underscore it should be considered a space if there is no term that has an 
underscore and there is one with a space.

3) Have protege be more generous with completion choices - offering matches in 
the middle of terms below matches from the front, and offering recently chosen 
completions at the top of the list when they match, and match on first letters 
of words, so that ipo or po (matching from the middle) both offer is part of.

4) Make it supereasy to add abbreviations - see e.g. http://www.typeit4me.com/. 
So someone should be able to easily say "I want you to offer me is part of 
every time I'm in a relation completion context and type an initial p."

Original comment by alanruttenberg@gmail.com on 21 May 2012 at 1:19

GoogleCodeExporter commented 9 years ago
In this case the discussion is not so much about what Protege supports or not, 
but what people working on and with those resources feel comfortable with. 
Several of us are of the opinion that we should be using underscores, for 
reason of readability among others. The countergument was the paper by Schober 
et al., which is based on a 5 years old survey, and may not reflect tool and 
user reality anymore. The second counter argument was that it will be easier 
for the end user to read; but at the same time comment #2 above justifies 
keeping "is" in relation names by saying "it's ok, we'll anyway need to 
preprocess for end user", so it seems like that point is moot.
Would one option be to poll the BFO community and decide based on preference?

Original comment by mcour...@gmail.com on 21 May 2012 at 3:16

GoogleCodeExporter commented 9 years ago
re: "Would one option be to poll the BFO community and decide based on 
preference?"  - please file procedural issues separately as Type-BFO2-Process.

Re: other comments, seems like we are going in circles. The question isn't 
whether, but how. One proposal has uniformity in editor preferred label and 
legacy in an alternative term, and the other the other way around. 

Original comment by alanruttenberg@gmail.com on 21 May 2012 at 3:56

GoogleCodeExporter commented 9 years ago
Procedural issue created at http://code.google.com/p/bfo/issues/detail?id=34

Re circles, both options are not equivalent. I was suggesting we use rdfs:label 
with value part_of, while you are suggesting to have an rdfs:label with value 
is part of, a new annotation property with value part_of, and/or require some 
development from the protege team. If you disagree, it would be helpful if you 
could provide an example of both cases as you see them, with all label related 
annotation properties, to make sure we work with the same basis.

Original comment by mcour...@gmail.com on 21 May 2012 at 4:30

GoogleCodeExporter commented 9 years ago
This issue has nothing to do with Protege. I'm arguing that the current 
typographical convention is good *because* it helps distinguish relations from 
English words/phrases with the same spelling. The 'is' makes some Manchester 
Syntax expressions less readable and the also makes at least some references to 
relations in free text extremely clunky. 

A poll of OBO foundry + BFO folks seems a reasonable way to resolve if we can't 
resolve here.

Original comment by dosu...@gmail.com on 21 May 2012 at 4:21

GoogleCodeExporter commented 9 years ago
+1 for using underscores
neutral regarding the use of "is"

However, this discussions should not delay any further the release of a BFO2OWL 
trial version. 
Once such a trial version can be tested by the user community,we will have a 
more complete picture about the range of opinions and can take a more informed 
decision. 

Original comment by steschu@gmail.com on 25 May 2012 at 1:48