w3c / scholarly-html

Repository for the Scholarly HTML Community Group
35 stars 26 forks source link

Extend schema.org properties (where appropriate) for handling both simple and complex roles #15

Open essepuntato opened 8 years ago

essepuntato commented 8 years ago

Simple roles can be described by directly linking the two involved entity as before, e.g.:

schema:ScholarlyArticle -- schema:author --> schema:Person | schema:Organization

Complex roles need at least an additional indirection, e.g.:

schema:Person | schema:Organization -- new:hasRole --> schema:Role schema:Role -- schema:roleName --> schema:affiliation schema:Role -- new:relatesToDocument --> schema:ScholarlyPaper schema:Role -- new:relatesToOrganization --> schema:Organization

However, in the proposed approach in the SH documentation and according to the guidelines of schema.org, it seems that schema:author cannot be used with ContributorRole, but only with schema:Person and schema:Organization – since it has been actually implemented as a "simple role".

An approach like the above one for complex roles could be preferable for affiliations and, in addition, it is very easy to extend with new roles - since it basically means to use additional individuals as objects of the property schema:roleName – individuals that can belong to a decided external list or defined within schema.org as well. A similar approach has been used in the Publishing Roles Ontology for addressing exactly the same issue (see the class pro:PublishingRole for a list of possible roles in that context).

In addition, in SH we should clarify if we want to handle simple roles (e.g., the author of a paper) + complex roles when needed (e.g., the affiliation), or if we would like to handle all the roles as complex roles, in order not to use two different mechanisms for describing them.

Additional source for this topic: Peroni, S., Shotton, D., Vitali, F. (2012). Scholarly publishing and the Linked Data: describing roles, statuses, temporal and contextual extents. In Sack, H., Pellegrini, T. (Eds.), Proceedings of the 8th International Conference on Semantic Systems (i-Semantics 2012): 9-16. New York, New York, USA: ACM. DOI: 10.1145/2362499.2362502

pjohnston-wiley commented 8 years ago

@essepuntato , i'm not sure what you mean by schema:Role being only for 'simple' roles. The intent, as i understand it, behind the schema:Role construct is to implement N-ary relationships in as intuitive a way as possible. It enables this through predicate chaining, so that if i want to navigate the simple relationship i need only follow the same predicate in through the role instance and out to the target. This makes for a very simple SPARQL implementation, for example. I think here that it is a mistake to interpret schema.org's scoping declarations as ontologically sound.

Really, schema:Role is equivalent to schema:Thing, and is intended to exist outside the regular scheme of things (see, for example, this and this). In this sense, schema:Role is the spiritual cousin of constructs like rdf:List. It is a convention, nothing more.

This doesn't mean that schema.org cannot be expressed ontologically, it just means that you need to interpret the constructs as described at schema.org when implemented in an ontology.

The proposal here would be that sa:ContributorRole eventually become schema:ContributorRole. Even then, it would not be explicitly declared as usable for schema:author and schema:contributor, in the same way that schema:employee does not explicitly declare that it can use schema:EmployeeRole as both its subject and object.

While what you propose in the reference above would also work, i don't think it is how schema.org is intended to work. I had started with the same construct as you for our own (Wiley) ontologies, essentially deriving from PROV-O's prov:qualifiedInfluence, but once i overcame the hurdle of being able to express schema:Role ontologically, i felt that the schema.org construct read as more intuitive.

For reference, this is how i have written the schema:author property in OWL:

schema:author rdf:type owl:ObjectProperty ;
              rdfs:subPropertyOf prov:wasAttributedTo ;
              rdfs:range [ rdf:type owl:Class ;
                           owl:unionOf ( sa:ContributorRole
                                         schema:Organization
                                         schema:Person
                                       )
                         ] ;
              rdfs:domain [ rdf:type owl:Class ;
                            owl:unionOf ( sa:ContributorRole
                                          schema:CreativeWork
                                        )
                          ] .
essepuntato commented 8 years ago

Hi @pjohnston-wiley,

Actually, I think there was a misunderstanding here, due to the unclarity of my text for sure.

I didn't want to say that schema:Role is for simple roles (i.e., direct links via a property defining the role, such as schema:author), rather the opposite. That, if we need to express a "composite role", such as affiliations, then we need to use schema:Role, and I perceived (but maybe I'm wrong) sa:ContributorRole as a subclass of schema:Role.

Another tip: when I refer to the "new:" terms in the examples, I actually mean that we should extend schema.org appropriately for including such term – saying that they are now expressed within SA or any other ontology is irrelevant, since the goal should be to have everything we need in schema.org (see issue #6).

Clarified that, in order to enable such complex role to be expressed appropriately, we need to change the current schema.org definitions for all those properties (e.g., schema:author) that we would like to use, since, currently, they don't list any "schema:Role" in their domain/range, as you already noticed.

Even then, it would not be explicitly declared as usable for schema:author and schema:contributor, in the same way that schema:employee does not explicitly declare that it can use schema:EmployeeRole as both its subject and object.

Well, if we are going to create a schema:ContributorRole class, then I think we need (if possible) to change the current definition of all such existing schema properties, such as schema:author, so as to enable its use as domain and range of such properties, otherwise everything would be not so clear from a pure formal point of view. It's fine to create/use properties in a way that is good for humans, but we should also be sure to provide a mechanism that is technically and formally sound.

The other issue I've tried to highlight is the fact that, in principle, we could have a quite large list of roles that we could be interested in, e.g., author, editor, publisher, curator, guest editor, editor-in-chief, reviewer, etc. Thus, we have two option for allowing people to specify such roles:

  1. Create a new schema.org property for each of those, enabling the possibility of using them with the class schema:ContributorRole; or,
  2. We reuse (and extend) some existing schema.org entities, such as schema:Role and schema:roleName so as not to modify any possible property defining new roles.

To me, option 1 seems less practical, since every time we need a new role we have to add it, basically, to schema.org, and also we should have to bring all the complexities related to the managing of schema:ContributorRole appropriately.

i felt that the schema.org construct read as more intuitive.

In addition, the fact of using the same property twice, as suggested in the current SH documentation, is far from being intuitive to me, rather it seems quite confusing. In particular, the use of schema:author in the chain

schema:ScholarlyArticle ⟼ schema:author ⟼ schema:ContributorRole ⟼ schema:author ⟼ schema:Person

has basically tWo different meanings depending on where it is used. In schema:ScholarlyArticle ⟼ schema:author ⟼ sa:ContributorRole is basically saying that there is a scholarly article that involve a contributor role as author, while in sa:ContributorRole ⟼ schema:author ⟼ schema:Person is used to say that a contributor role has a person as author.

I know this is not the real intent – i.e., saying that a particular scholarly article is indeed authored by a particular person –, but the current proposal is basically using schema:author with two different meanings. Thus, my suggestion was to have some alternative property that makes the link between the schema:ScholarlyArticle and schema:ContributorRole, that should not be the same used with between schema:ContributorRole and schema:Person.

Thus, if we want to adopt the aforementioned option 1, then a possibility could be having something like the following:

:my-article a schema:ScholarlyArticle ;
    new:hasAgentWithRole [
        a schema:ContributorRole ;
        schema:author :john-doe ;
        schema:affiliation :university-of-oxford
        # etc.
    ] .

And, from this, we could also infer automatically the following:

:john-doe schema:affiliation :university-of-oxford .

:my-article schema:author :john-doe .

However, it is not clear (formally speaking) why schema:author should link then the article to the person, while schema:affiliation the person to his affiliation - we are basically using two properties with different intended behaviours upon the same subject, i.e., the individual of schema:ContributorRole. Why (still from a pure formal, machine-oriented, point of view) the affiliation is not associated with the article? It is just a matter of human interpretation to provide the 'right' semantics to all those triples.

In addition, we always need two things for enabling that situation: (a) to modify all the schema.org properties saying that the domain of schema:author, schema:affiliation, and all the other properties that could be used in this context (i.e., with schema:ContributorRole), and (b) to extend schema.org so as to include all the properties identifying roles with need – which is still a base line, as far as I understood.

Option 2, however, is an alternative that would allow us to describe the same scenario without imposing all such modification to schema.org, and in addition it would reuse existing properties already defined within schema.org. Supposing that schema:ContributorRole is subclass of schema:Role, and considering the fact that the role is held by an agent, not by the scholarly article, we could say:

:john-doe a schema:Person ;
    new:role [
        a schema:ContributorRole ;
        schema:roleName schema:author ;
        new:roleAppliesTo :my-article ] .

:university-of-oxford a schema:Organization ;
    new:role [
        a schema:ContributorRole ;
        schema:roleName schema:affiliation ;
        new:roleAppliesTo :john-doe ;
        new:roleAssociatedWith :my-article ] .

And, from this, it is quite natural to infer that

:john-doe schema:affiliation :university-of-oxford .

:my-article schema:author :john-doe .

In addition, with this approach we can avoid to change all the existing and future properties for roles (i.e., schema:author, schema:affiliation, and the like) by saying that they should also have as domain also schema:ContributorRole, since everything is handled in a different way, respecting the current implementation provided by schema.org.

Still, it is quite easy to use a new role: we could then use any property already defined in schema.org as individual (which is guarantee by OWL 2 DL metamodelling) or we can even refer to external vocabularies if needed – even if this solution is less practical due to the necessary use of another prefix, see issue #6 again.

That said, of course we should also study an easy mechanism to include such assertions in a SH article by using RDFa, if needed.

pjohnston-wiley commented 8 years ago

Thanks for clarifying, I got a little confused with complex vs simple roles, but I get what you are trying to say now, so we can just talk about 'complex roles'.

On changing schema.org’s definition, I think therein lies the problem. Schema.org is fundamentally not an ontology in the formal sense, hence why they can put things in a blog post and it becomes a de-facto rule. At best, schema.org is a simplification along the lines of SKOS. I don’t think it is by accident that there is no official ontology for schema.org – TopBraid continues to autogenerate one, but I have not found it useful when using schema.org terms together with ontologies: you need to manually curate the definitions to introduce them to an OWL formalism. I have found it much more useful to contextualize schema.org within the ontologies we maintain, which then affords us to extend the implicit rules of schema.org within the OWL formalism.

Because of schema.org’s origins in, and stated goal of, search engine optimization (SEO), the lack of adherence to a rigid formalism makes sense. It is there to tag content on the Web, and it is a pragmatic expression of linked data, nothing more. For this reason, you can tag a piece of text, say “Fred Flintstone”, using schema:name without attaching an immutable identity to the underlying schema:Person. With Scholarly HTML, we are trying to do a little better, by recommending the use of things like ORCID, but we will still accept a person without a formal identity. As a publisher, we are able to take this context and use it to guide the submission process a little better than if the content had not been tagged at all.

Trying to impose the full weight of semantic expression on schema.org will break its original goal. That isn’t to say, as I have described, that you can’t use it within an ontological context, you just need to contextualize it. I don’t think this is a good or a bad thing, it is just something we have to work with if we choose to use schema.org as a vocabulary for SHTML. I am quite happy to also work on an OWL expression of this (I already am), but I don’t think it should be forced on schema.org beyond establishing the classes and their intended scope within an SEO context.

I still think that schema:Role is a corollary to rdf:List. In a given ontology, I don’t impose the use of rdf:List for a specific predicate range, but everyone accepts that I can use it. That isn't to say that within a given graph, I can't set it as a constraint (e.g. In SHACL), but this is more to do with information processing between systems rather than trying to assert semantic integrity across the Web. We MAY have more luck proposing something like an rdf:Role as a primitive that can be used in similar ways.

essepuntato commented 8 years ago

Hi @pjohnston-wiley,

I totally agree not to impose a particular OWL-based semantics to schema.org – as you said, schema.org did not specify it on purpose, and the authors behind schema.org have good reason for doing that. However, while schema.org does not define formal and strict semantics, it actually provides suggestions (e.g., the fact that http://schema.org/author should be used with people and organisations), and these are the information a web developer use for understanding how to use such schema.org entities.

That's why the chain

schema:ScholarlyArticle ⟼ schema:author ⟼ schema:ContributorRole ⟼ schema:author ⟼ schema:Person

is pretty confusing to me, and I think even to another user: because it reuses schema:author twice, with two different (implicit, if you want) semantics, and in a way that is not the one officially suggested by schema.org itself.

Thus, given such premises, why we should support this ambiguous approach, when we could work on extending (even without changing the current status of) schema.org for adding the minimal set of new entities for allowing more precise descriptions of this kinds – also considering that past works have already proposed solutions for that in "formal" OWL ontologies (e.g., PROV-O and PRO?

To recap, the two alternatives I've proposed above where:

Option 1 schema:ScholarlyArticle ⟼ new:hasAgentWithRole ⟼ schema:ContributorRole ⟼ schema:author ⟼ schema:Person

Option 2

schema:Person ⟼ new:role ⟼ schema:ContributorRole 
    ⟼ schema:roleName ⟼ schema:author
    ⟼ new:roleAppliesTo ⟼ schema:CreativeWork

Option 1 would need to extend schema.org with new entities and change the "suggestions" on schema:author (and any other role-based existing property we intend to use) so as to having also schema:ContributorRole as possible domain for that property.

Option 2 would mean only to extend schema.org without affecting its existing status.