Closed DanCarey404 closed 3 years ago
@DanCarey404 Can you please add a summary of the agreed-upon standard so that it is documented here? Then can #20 be closed?
Replaces #20
Per Rebecca's request, these are the labeling standards being implemented.
Just to clarify: this was not my request: it was my proposal, which the group discussed and agreed on. :)
This task needs an assignee.
@sa-bpelakh Boris has a query for this. Can you pop it into this issue, for convenience?
What I have are SHACL rules that validate that labels are conformant: https://github.com/semanticarts/platts-ontology/blob/develop/shapes/ontologyShapes.ttl. They enforce the policy described above, and even detect acronyms in all caps (minimum of 2 letters, I believe) and ignore their casing. The current version does not allow for numbers in class or property names, so if that's a requirement, we'll have to make some changes.
I think we should allow numbers; e.g., hypothetically we could define classes like Shimano105Components, Iso639 (subclass of Category), TourDeFrance2020Racers, CharactersIn1984, ...
I can see that for a specific domain, but for the base gist? We can set up a fun game of regex golf for labels.
Maybe it's less likely in an upper ontology, but why exclude it in principle?
Re regex golf - that looks fun! Maybe at our next happy hour?
Maybe it's less likely in an upper ontology, but why exclude it in principle?
We get to choose our own stylistic conventions, as does each client project. I don't think we want gist to have numbers in IRIs and a rule for this would find a one the looks exactly like a lower case el. So I vote to put it in our gist checks as a warning.
If we disallow numbers in local names, and we happen to come across the need for one, we are then forced to spell the number out, which I think is worse. What's wrong with numbers in IRIs?
A reminder that this issue is the implementation of a set of conventions that had already been decided on and documented in the gist style guide. The point is not to revisit the decisions here. Quoting from the style guide that we had agreed on:
Isbn10
, not Isbn-10
or ISBN-10
.This issue surfaced because I want to find a new assignee, since we agreed on the implementation back in April and have been postponing it since them.
@sa-bpelakh Platts and gist should be able to have different naming conventions. Is it onto_tool
that applies the SHACL rules? If so, the SHACL shapes or files to invoke (or a folder containing them) should be configured in the YAML file, or stored in a particular directory, or something.
Its true that this issue should not get into what the style conventions are. That can be debated in a separate issue if anyone care enough to raise it.
@rjyounes Yes, the bundle file configures which shapes to apply. So we can configure whatever we consider appropriate for gist, and customers can, um, customize 😄 whichever way they want.
Team has disagreement on the naming convention as of 12/10/2020 issues meeting. @DanCarey404 will poll SA ontologists. While there IS consensus to follow the standard, there is not consensus ON the standard.
(Detail: Some want Title Case for classes, not sentence case. Some want Title Case for all concepts. Rationale: all are concepts, and labeling for particular use cases (like sentence generation vs. column headings) won't always work. )
PS @sa-bpelakh will modularize the SHACL checking to allow ease of applying different conventions based on where starndard lands.
@marksem @DanCarey404 Can we please move discussion of this issue to a gist review meeting and notes here? Our goal is to be transparent, and decisions made by internal polling are not. In addition, there needs to be a rationale for reopening a decision that was made months ago. We cannot rethink every issue for those who did not attend the discussion. If someone who is unable to attend wants to provide input, that can be indicated here and we can accommodate them by scheduling a special meeting if needed.
My input is based on earlier decisions now recorded in the gist style guide:
Classes
Properties
Rationale
We adopt sentence over title case because the latter, while technically well-defined, has more complex rules and can introduce inconsistencies when implemented by different users.
Additional notes:
I find @rjyounes 's arguments and rationale compelling. If anyone wants to use labels for column headers then they can introduce a subproperty of altLabel called, say titleCaseLabel.
I didn't realize that one of the issues at stake in the renewed discussion was the use of labels as column headers. IMO that makes the case even stronger: it's hard to justify considering the preferred label as one designed for column headings or any other implementation-specific use. We have actually had this discussion during review of #20, where we reached the same conclusion as in @uscholdm's suggestion above, to define additional annotations for application-specific needs. In the case of column headers, they are (or could be) the same as the local names, so one could parse the IRIs to derive the local names for use as column headers and not maintain the values in an annotation.
I suggest that all words in a label have a leading capital. One reason for this suggestion is that Notepad++ has a convert case option (Proper Case) which does that, as does MS Word (Capitalize Each Word). This removes ambiguity from the rule and ensures the consistency that some are looking for.
@DanCarey404 Are you suggesting that even function words (prepositions, articles, etc) would be capitalized? That's not a type of casing I've ever heard of, other than the applications you mention.
One reason for using initial lower for properties: we use labels that are tied to the local names, and should preferably be derivable from them by some simple rules, such as adding whitespace at word boundaries indicated by camel-casing. Since our properties have local names with initial lowercase, this suggests the labels should follow suit.
These are the logical options for classes and properties:
Note: 2-4 make exceptions for acronyms and terms that are generally capitalized: Social Security Number, has SSN, has Social Security Number.
I would reject 5 because a label is meant for humans and thus should be in natural language.
We haven't mentioned taxonomy terms. Logical options for taxonomy terms:
Review of conventions used by well-known ontologies:
SKOS: Concept Scheme, exact match (2) PROV: SoftwareAgent, atLocation (5) FOAF: Online Account, based near (2) OAI-ORE: Aggregated Resource, Is Aggregated By (1) OWL Time: Duration description, has beginning (4) BIBFRAME (Library of Congress): Key title, Has event content (3) dcterms: Method of Accrual, Date Modified (1) Schema: Ignore Action, Accepted Offer (1) Lingvo: Language resource, resource type (4) Open Annotation: TextPositionSelector, hasBody (5) Ordered List Ontology: Ordered List, has ordered list (2)
Conclusion: There are no generally accepted conventions; we should choose whichever one we like best.
Note on title case: There is no one standard for title case: see https://en.wikipedia.org/wiki/Title_case. Chicago Manual of Style, Associated Press, etc. each define their own, though of course the broad convention is common to all. If we adopt title case, I propose that we choose one of these standard variants (or invent our own) and document it in the gist style guide as a reference for ontology developers and reviewers.
I also propose that labels conform to natural language standards by the insertion of, for example, hyphens, even if our standards for local names do not include such characters. E.g., ISBN-10 for class Isbn10.
Notes from 2021-01-14 triage meeting:
Dave: When do we see labels?
Which would you rather see in these contexts?
Rebecca: we also see them in documentation (e.g., Widoco)
Peter: accuracy more important than typographic consistency
Will vote next meeting.
Thank you @rjyounes for comprehensive summary.
Conclusion: There are no generally accepted conventions; we should choose whichever one we like best.
Exactly.
We haven't mentioned taxonomy terms.
Most taxonomy terms are instances of gist:Category
, which is a lot like a class, semantically. the key technical difference is that we use gist:categorizedBy
instead of rdf:type
to indicate what kind of thing something is. So we may want to adopt the same convention for taxonomy terms as we do for Classes.
These are the logical options for class and property labels:
Offline voting yields #2 as the winner.
Rebecca will compile a short list of title case conventions for consideration at next meeting. The selected convention will be included in the gist style guide.
I've sorted through a number of style guides from reputable sources (AP, APA, Chicago Manual of Style, MLA, NYT, Wikipedia). The details are included in the attached document as I think they will not be of general interest. I've come up with an amalgam of various conventions that is also computable (e.g., a rule to capitalize nouns, verbs, adjectives, adverbs, and pronouns, or to lowercase prepositions unless stressed, is not computable), as follows:
Attachment: Title Case Conventions.pdf
Regarding automated conversion of local names to labels: there's an issue in the conversion of acronyms and hyphenated words. There are two possible local name conventions:
hasSSN
hasSsn
. The argument is that word boundaries can be easily detected. isCiaAgent
allows word boundary detection, while isCIAAgent
does not. Even for human users, the word boundary is easier to see in the former.However, labels should include natural language formats: is CIA agent, not is Cia agent. The correct version cannot be algorithmically computed from either local name.
The same may be true of hyphenated words, depending on the local name convention. ISBN-10 can be automatically computed from ISBN-10
but not from Isbn-10
, ISBN10
, or Isbn10
.
In fact, in general it is easier to derive the local name from the label than vice versa.
If we want to stick to our proposed local name conventions, we will use the forms hasSsn
, isCiaAgent
, and Isbn10
. These require human correction once the automated label generator has applied. If the latter runs before every release, we would need human intervention each time. Another option: add a skos:editorialNote
indicating to the generator that the label should not be touched.
In fact, in general it is easier to derive the local name from the label than vice versa.
Interesting observation, it usually goes the other way, but this sounds correct.
The argument is that word boundaries can be easily detected.
isCiaAgent
allows word boundary detection, whileisCIAAgent
does not. Even for human users, the word boundary is easier to see in the latter.
I think it is easier to see the boundary in the former: isCiaAgent
. Was that a typo?
Yes, that's an error. I've fixed it above.
Title case proposal above accepted for implementation.
Boris will fix all labels, first by automation and then manual adjustment for exceptions.
In writing the label validation script (see PR #428), Boris noted that proper nouns in labels must also retain capitalization. An emended version of the label conventions follows:
Title Case Convention
Label Conventions Classes: title case (as above) Properties: all lowercase
The following exceptions apply to both class and property labels:
The exception for proper nouns makes the convention not fully automatable.
The implementation of these conventions in current labels will be done by Boris using a script with manual corrections (for the non-automatable exceptions). To support label validation as part of bundling the ontology for release, we will add an additional ontology file with an annotation signaling to the validation script that the label is not subject to the validation rules. We propose gist:nonConformingLabel
for the annotation. See additional notes in PR #428.
Any objections to the annotation name should be voiced here.
Add/replace rdfs:label values according to the agreed-to standard.