w3c / EasierRDF

Making RDF easy enough for most developers
263 stars 13 forks source link

Profile: Exactly one type per representation of a resource #50

Open azaroth42 opened 5 years ago

azaroth42 commented 5 years ago

Intent: Allow code to instantiate object-oriented classes from a set of triples that describe a resource

Profile implication: Ontologies can be mapped into a class/subclass hierarchy in the manner of an ORM. This handles domain/range of properties and the related SHACL/SHEX validation of those properties. However, as instances in most OO languages can only be of a single class, it requires that the set of triples used to construct this instance should contain exactly one instance of rdf:type where the subject is the resource and the object is the class to instantiate.

From: #15

dbooth-boston commented 5 years ago

I kind of like this idea for newbie use, even though it obviously rubs against a key strength of RDF: that an object can belong to more than one class at a time, both through subclassing and independent roles. But in my experience even in more sophisticated uses of RDF, when an object belongs to multiple classes, there is usually one class that is the preferred class of that object, for purposes of rendering and such, at least within a particular context of use. I wonder whether something like rdf:prefType (analogous to skos:prefLabel) might be helpful for those cases. Just a thought.

VladimirAlexiev commented 5 years ago

In the context of CIDOC CRM (hello @azaroth42 :-)) the prefType is usually the lowest-level most-specific class, and the others come from subclass inference. Higher-level abstract classes are rarely useful. I guess it's similar in other domains.

Schema strongly shuns abstract classes, even to the extent of refusing to add a super-class of Person and Org (Agent).

chiarcos commented 5 years ago

@azaroth42: Isn't that basically the case already? Multiple rdf:types are factually equivalent to one anonymous class that represents their conjunction (in OWL). (Implementation-wise, it is easier to process conjunctions as multiple rdf:type statements, though.) Or do you want to suggest to abandon anonymous classes?

chiarcos commented 5 years ago

@VladimirAlexiev: For the terminological harmonization of domain terminologies (in my case, annotation schemes for NLP), I've long been arguing that multiple inheritance is a key advantage of OWL modeling in comparison to hierarchical tree structures. Please don't take that away ;) The objective is that, say, something like burning in burning man in is both a verb (lexically) and an adjective (syntactically), that verb and adjective are two different concepts, and that existing domain models (annotation schemes) tend to disagree on the classification if they are forced to chose one or the other. This has the funny consequence that in a traditional, hierarchical model, the "universal" definition of VERB is "depending on language and context" (http://universaldependencies.org/u/pos/VERB.html), i.e., non-universal. In OWL, it can be both.

namedgraph commented 5 years ago

However, as instances in most OO languages can only be of a single class, it requires that the set of triples used to construct this instance should contain exactly one instance of rdf:type where the subject is the resource and the object is the class to instantiate.

"Traditional" ORM for RDF is a pretty bad idea.

There are solutions to this that support multiple types. A prime example is polymorphism support in Jena.

dbooth-boston commented 5 years ago

[FYI, I changed the title of this issue because 'representation' has a special meaning in HTTP and Web architecture.]

dbooth-boston commented 5 years ago

multiple inheritance is a key advantage of OWL modeling in comparison to hierarchical tree structures. Please don't take that away

Maybe multiple inheritance doesn't need to be taken away in order to accomplish this goal. Maybe an object could have a single preferred or default class (rdf:prefType?), while still belonging to other classes, i.e., still allowing multiple inheritance. That might provide users with an easier entry point without taking away multiple inheritance.

azaroth42 commented 5 years ago

Representation was intentional. A resource cannot be prevented from having more than one class, but each representation can only expose one at a time.

azaroth42 commented 5 years ago

@chiarcos I don't think anonymous classes are a good idea at all, and would propose in a profile of RDF that is intended to be simpler to implement and understand, they should not be used. This should be a separate issue though.

@namedgraph I think you're missing the point of this exercise, which is to discuss a simpler to explain and implement profile of RDF. Thus relying on traditionally understood tools is exactly the approach needed, rather than forcing everyone to be an expert in open world graphs. Unless you plan to implement full polymorphism libraries for every language?

dbooth-boston commented 5 years ago

@azaroth42 , can you explain what you mean by 'representation'? Do you mean it in the WebArch sense? Or the HTTP sense? Or some other sense?

VladimirAlexiev commented 5 years ago

@chiarcos I also think multiple inheritance is often useful. (And OLIA is imho one of the few good applications of OWL where OWL is necessary).

I just said that high-level abstract classes are not often useful, because nobody queries for them. People sometimes fall into the trap of constructing complex class hierarchies to accommodate the monomorphic nature of rdfs:domain/range.

namedgraph commented 5 years ago

@azaroth42 what's the point of even using RDF if you're clipping its features and expressiveness by using software architectures that predate it by several decades?

To exploit RDF, software has to be built for it, not the way around.

azaroth42 commented 5 years ago

@dbooth-boston I withdraw the issue, as this process is clearly never going to converge. We'll simply do our own thing, rather than trying to talk about it in an unfriendly space.

dbooth-boston commented 5 years ago

@azaroth42 , while I sympathize with your frustration, I am re-opening the issue because I do not believe it has been resolved. Please continue to participate! Your contributions are valuable!

As a reminder to all, please re-read the W3C Code of Ethics and Professional Conduct. Being dismissive or derisive toward other contributors is NOT acceptable in this forum.

This repo was created to collect new ideas for making the RDF ecosystem easier for newcomers. This does not mean that every contributor must agree with every idea, nor does it mean that every idea will be eventually adopted. All views and ideas are welcomed, provided that they are on topic, constructive and expressed respectfully. For example, if you think something is a bad idea, then at least say why you think it is a bad idea. Even better would be to make a constructive suggestion for improving it. Shooting it down with a derisive remark is NOT constructive and NOT acceptable behavior.

Furthermore, since the whole point of this forum is to explore creative new ideas for making the RDF ecosystem easier to use, it is far more important that we create a welcoming atmosphere that invites new ideas and newcomers than it is to debate an idea's merits. Creativity cannot flourish in a climate of criticism, intimidation or elitism. That's Brainstorming 101. Bring on the ideas!

HughGlaser commented 5 years ago

Muggins here, asking for explanation :-) So, how do I know I have conformed? Is the "Profile: Exactly one type per instance of a resource v Profile: Exactly one type per representation of a resource" a question about that? If I create my RDF with one rdf:type per resource, do I conform? That is, is it: "Profile: Exactly one rdf:type per resource" I am hoping that is true? And if so, can't we just say that? I think syntactic constraints on documents are so much easier to understand and embrace than semantic ones; otherwise I need to remember OWA and stuff all the time, which as the discussion above shows makes for more brain activity.

azaroth42 commented 5 years ago

How about -- when you retrieve a representation, then there is exactly one triple with rdf:type as the predicate per subject resource in the graph.

Meaning that you can have different representations that assert different types for different profiles of usage, but each representation only asserts one at a time. So my Person can be a schema:Person in my SEO-oriented, html-embedded JSON-LD, and a crm:E20_Person in my research-oriented Turtle view.

rivettp commented 5 years ago

Even within a SEO scenario the same URI can usefully have multiple classes and that's one of the benefits of RDF compared with more traditional technologies, e.g. in the case of sole traders an instance of schema:Person could also be an instance of schema:Plumber or schema:Dentist .

namedgraph commented 5 years ago

If in your RDF environment you for some reason require rdf:type properties with a cardinality of 1, this is very easily expressible as SPIN or SHACL constraints. sh:maxCount is designed exactly for that.

Just validate all your incoming data. If the cardinality is invalid, well, then do something about it because you're the one imposing such strict requirement.

Problem solved. No RDF profile necessary.

namedgraph commented 5 years ago

I think this also relates to Postel's law:

Be conservative in what you send, be liberal in what you accept

azaroth42 commented 5 years ago

I give up trying to have a meaningful discussion with @namedgraph present. @dbooth-boston, as the de-facto chair, I request that you enforce the code of ethics.

"Just use complicated, not well adopted, not well implemented, and not well documented specifications to solve the problem of developers not being expert in complicated, not well adopted, not well implemented and not well documented specifications" is, as always, entirely dismissive and derisive, missing the entire point of the discussion.

Conversely, if you find @namedgraph's comments acceptable, please re-close the issue.

namedgraph commented 5 years ago

@azaroth42 citing you

However, as instances in most OO languages can only be of a single class, it requires that the set of triples used to construct this instance should contain exactly one instance of rdf:type where the subject is the resource and the object is the class to instantiate.

I gave you the exact solution to your problem using existing tech, but you can't even address why it's inadequate. And I'm the one not having meaningful discussion.

If it's again about the "33% developers", then indeed no further discussion necessary. I will not comment on this issue anymore.

HughGlaser commented 5 years ago

I'm sorry to see the conflict. FWIW, my view is that if "very easily expressible as SPIN or SHACL constraints" is part of the answer, it is unlikely that the process of "doing" RDF is being made Easier.

But here's a go at continuing.

We should probably distinguish between publishing and consuming a bit - apropos Postel's Law. One thing I am after is Best Practice for me creating - mainly because I don't want to have to think about choices - I should be thinking about representing the knowledge itself, not the representation method. Thus, to say everything should have exactly one type will take me a long way on the publishing side. If I am creating an Entity (to use a term), then I should remember to give it a type. And I should generally avoid giving it more than one type. Were I to do so, it would be a deliberate decision, driven by a particular need, which is also OK, but a conscious choice. If I am consuming, it may be that I can usefully exploit knowledge about Entities, such as only having one type in this document. I'm not sure how useful that is, but it may well be.

Having written that, I think it confirms a feeling I have - this is (for me) more about helping people who are creating RDF to do so with the minimum distraction with unnecessary detail, while producing "good" RDF by some unspecified metric.

dbooth-boston commented 5 years ago

I would rather not have to get heavy handed in keeping this discussion respectful and on topic. So again I would like to appeal to all participants to please be respectful, on-topic and accepting of other viewpoints. Some points I want to stress:

Thanks!

dbooth-boston commented 5 years ago

this is (for me) more about helping people who are creating RDF to do so with the minimum distraction with unnecessary detail, while producing "good" RDF by some unspecified metric.

Yes, that is definitely one of the target use cases. But I think there are others also, that involve processing of RDF data.

azaroth42 commented 5 years ago

I also believe that a single class makes it easier to consume, as validation of the properties is easier and the use of traditionally understood techniques (including but not limited to) ORMs is possible.

I would like to explore the prefType idea though -- as I do agree with @rivettp that there are well used ontologies where multiple types are valuable when they're used for flags or roles, rather than more abstract classes.

If there was a prefType predicate, would that help in the situations where there's a real use case from the data?

And meta-issue-wise:

I should be thinking about representing the knowledge itself, not the representation method.

I think this nails it. We should be able to focus on the data and make it as usable as possible to the widest audience, which means consistency with existing, well-understood representation and processing paradigms.

madnificent commented 5 years ago

We have a mapping from SPARQL to {JSON:API} and have had no problems with multiple inheritance being in the dataset. You can easily choose to ignore certain types. In fact, it's the automatic behaviour when filtering on types through queries.

Tackling polymorphism or inheritance trees turned out to be roughly the same as tackling multiple inheritance in our estimations. This is from a read-write json api perspective which might be slightly different than what you are thinking about.

I'm interested to hear which practical issues you foresee and don't mind sharing our experience with this setup. Implementation is at mu-semtech/mu-cl-resources.