w3c / EasierRDF

Making RDF easy enough for most developers
267 stars 13 forks source link

Reduce the jargon #59

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

From https://lists.w3.org/Archives/Public/semantic-web/2019Jan/0002.html

It might be easier for complete newbies if plainer language was used:

  • Resource -> Thing
  • Predicate/property -> Relation

Then a statement would be:

Thing–Relation–Thing

I understand the heritage behind the current naming, but for a newbie the first hurdle is understanding that "resource" has a different meaning to the one in the dictionary, that it actually means "thing". The dictionary definitions of "predicate" and "property" also don't correspond to the center position of an RDF triple in my opinion, whereas the dictionary definition of "relation" does.

Consequences would be RDF becomes TDF, or simply DF, to avoid redundancy. URI unfortunately becomes UTI, though it could be shortened in a similar way to simply UI.

I know it likely won't be a popular idea, but if you're looking for the perspective of relative newbies that's one about jargon I can share.

akuckartz commented 5 years ago

Funny, but no, please do not do that.

anthonymoretti commented 5 years ago

If for the moment we forget renaming RDF and URI, you don’t find that

Thing–Relation–Thing

is simpler than

Resource–Predicate–Resource ?

Very curious what people think as this was my proposal.

chiarcos commented 5 years ago

Please don't. It would be good to have a non-technical intro using graph terminology to introduce RDF terminology (and when we write introductions in text books, this is exactly what do) -- and we should --, but please don't touch the technical terminology (ever). "RDF resource" sounds somewhat misleading, though, and my non-technical term (before introducing "RDF resource") is "node" -- but this is actually an incorrect oversimplification, because properties can be subjects and objects of RDF statements. "Thing" would have similar connotations (from OWL), and then you end up with statements like that owl:Nothing is a Thing. So there is a good reason for keeping framework-specific terminology. IMHO, the only feasible work-around to avoid confusion with, say (language) resource or (grammatical) subject, etc., is to systematically use the terms "RDF resource" and "RDF subject".

anthonymoretti commented 5 years ago

Because of punning anything can be considered an owl:Thing though, right? So because owl:Nothing is an owl:Class, and classes can be things due to punning, can't owl:Nothing also be an owl:Thing?

I'm no expert by any means, but I always thought owl:Thing was equivalent to rdfs:Resource. So any illogical sounding statements involving "Thing" would already have equivalent illogical sounding statements involving "Resource" it seems.

draggett commented 5 years ago

The terminology will also depend on the level of the representation - if we're directly supporting n-ary relationships, something I think we should, then we may want to consider using terms relating to objects, properties and relationships. This would also lead to easier to learn serializations for data and rules. My understanding is that to reach the middle 33% of developers, we need something different from the existing RDF framework, albeit something formally built on top of that.

draggett commented 5 years ago

I note that the Property Graph folks don't seem to have an agreed term for graph nodes with properties, although some have considered "entity". In Cognitive Psychology, the agreed term is "chunk". Minsky used the term "frame". I reckon that "object" or "thing" are good candidates.

anthonymoretti commented 5 years ago

Do you mind expanding the point about the effect n-ary relationships might have on the terms?

I also find "object" familiar, but probably because of my experience with OO I think. "Entity" and "chunk" seem like jargon, whereas "thing" is used in everyday speech - something, nothing, anything, everything. Even taking a look at https://www.w3.org/TR/rdf11-concepts/#resources-and-statements, that section of text refers to things and relations.

draggett commented 5 years ago

Do you mind expanding the point about the effect n-ary relationships might have on the terms?

At the RDF core, we only have triples with subject, predicate and object. In many cases, people think at higher level, e.g. things with properties, and relationships between things, it it is an unnecessary burden to have to mentally map this into RDF triples.

Some further observations: In principle, relationships between things can also be seen as thing valued properties of other things. Metadata on the relationship can then be considered as sub-properties. However, I think that people like to distinguish relationships and properties, so it is helpful to have a way to do so. A relationship with metadata annotations can be modelled as a thing with the annotations as properties. I am interested in how to support these distinctions without confusing newcomers.

A further observation is the connection to the Web of Things, where things are digital twins for sensors and actuators, and exposed to applications as software objects with properties, actions and events. Things have RDF identifiers as a basis for describing the kinds of things, their relationship to the context in which they reside, and the object model with which they are exposed to applications. It would be great to have a common terminology, e.g. things, properties and relationships, to which the web of things adds actions and events.

I think all this points towards a higher level serialization for data, models and rules, that is formally layered on top of the RDF core.

chiarcos commented 5 years ago

I'm no expert by any means, but I always thought owl:Thing was
equivalent to rdfs:Resource. So any illogical sounding statements
involving "Thing" would >already have equivalent illogical sounding
statements involving "Resource" it seems. Yes, but having the term "thing" at the level of RDF makes the apparent
contradiction it conveys much more prominent than only having it within
OWL. I barely ever used owl:Thing in data modeling (in inferences, of
course), but an "RDF thing" would be rather ubiquituous.

Having that said, I would not object such a term in a meta language or
higher-level data model that builds on top of RDF, as long as it is
properly distinguished from RDF itself.

anthonymoretti commented 5 years ago

So something similar, Dave, to the distinction in OWL between object properties and data properties? Using the terms I proposed you’d have this hierarchy:

Relations
    Thing relations
    Data relations

On “resources”, if “nothing is a thing” appears contradictory then “nothing is a resource” appears equally contradictory, just in my view of course.

chiarcos commented 5 years ago

So something similar, Dave, to the distinction in OWL between object
properties and data properties? Using the terms I proposed you’d have
this hierarchy: Relations

Thing relations

Data relations

On “resources”, if “nothing is a thing” appears contradictory then
“nothing is a resource” appears equally contradictory, just in my view
of course. Strictly speaking, neither "nothing is a thing" nor "nothing is a
resource" are contradictory, just counterintuitive. But I did not
recommend to say "nothing is a resource", but "owl:Nothing is an RDF
resource". Not because the term is particularly good, but because it's
unambiguous.

BTW: Of course, data is not a thing, right?

HughGlaser commented 5 years ago

Yeah. Touching the current formal terminology is not a good Thing, and shouldn't be necessary. But conventions and agreements are good. And it is not unusual to have common synonyms used for practical realisations of formal treatments.

And I do find there are problems. Eg.: "Resource" has no intuition (for mortals). I would much prefer Thing (even if it is in OWL (too?)) - I don't like things like Object, because they are too concrete for me. But I see the problem that Resource is well-embedded, so I think we live with that.

However. What exactly should I call Relations when I talk to people? They may even have already seen people talk about Properties, Predicates, Edges, Arcs and a bucket-load of other things, I suspect. Anyway, aren't Properties Relations? Oh, no, Properties describe Relations - so that's very clear than, especially when apparently if I use it as such, it is a Predicate. Properties is wrong - it makes things look uni-directional. Predicates needs some logic background, or it makes no sense. Edges, Arcs, Orcs, etc. require a graph background. Relation has a shedload of good intuition.

Can't we (please) just use Relation, and agree to do so?

I have a bunch of other trivial-seeming things like this that I think require no actual work, and I think would help, that I will get around to posting soon, I hope.

anthonymoretti commented 5 years ago

“Data is not a thing, right?” I don’t really know if this is an answer, but rdfs:Literal is a subclass of rdfs:Resource, so maybe the answer is yes?

I agree with Hugh’s points about relations, the term is self explanatory.

Is it really too much though to add rdfs:Thing, and maybe very slowly, e.g. 5-10 years, deprecate rdfs:Resource? We have a description framework and the most central concept is poorly described. After working with RDF for a long time of course you get used to it, but the discussion is about getting adoption, and “Resource” is immediate and pervasive jargon for newcomers.

chiarcos commented 5 years ago

“Data is not a thing, right?” (...) maybe the answer is yes? I think so, too. But formulating an opposition between "Data relations"
and "Thing relations" implies that it is not.

anthonymoretti commented 5 years ago

Yep, totally agree with you. I was showing what a direct mapping of OWL terms to these terms would look like.

If you took a logical approach to naming the initial model might be:

Relations
    Data relations

Then if you wanted to create a mutually exclusive and collectively exhaustive set of classes, like in OWL, you can create a complementary class:

Relations
    Data relations
    Other relations

Then you give the complement the name of the parent class, because they are essentially plain Relations, and make the parent class abstract, which leaves two concrete classes and no mention of “Thing”:

Relations
Data relations
chiarcos commented 5 years ago

"Entity" was mentioned before. Any reason not to use established ER terminology (https://en.wikipedia.org/wiki/Entity–relationship_model)?

Then, we have the following terms: RDF Resource =: "Entity" (rdfs:Class =: "Entity type") rdfs:Property [missing] (instance of) owl:ObjectProperty =: "Relationship" (instance of) owl:DatatypeProperty =: "Attribute"

If we just keep "Property" in addition to ER terms, all is covered and we don't need to reinvent anything. Wrt. "Relationship", I would prefer to stay with "Relation", though.

In order to both establish a more consistent view and to keep that apart from the existing RDF ecosystem (which I would not touch), we can create a "lod:" namespace (or so) and define RDF, RDFS and OWL concepts as subclasses (or aliases) of these concepts. One advantage would be that this will remain fully backward-compatible, but if terminology is really that much of a problem, people will eventually move to the new namespace so that we can deprecate RDF and RDFS namespaces in something like, say, 10 years without ever breaking backward-compatibility.

Cf. #52

anthonymoretti commented 5 years ago

Just my view but the ER model is still jargon.

Entity - Do people use “entity” or “thing” in everyday speech? I think it’s clear, but you can also check Google Ngram Viewer.

Relationship - I agree that “relation” is preferable.

Attribute - I think it’s important to use the definitions of words as found in the dictionary, and the definition of “attribute” in the dictionary doesn’t describe an OWL Datatype Property.

And yeah a new namespace would be great, even if only because the “r” in existing namespaces isn’t applicable anymore if a different term to “resource” is used.

anthonymoretti commented 5 years ago

I think for correctness it might actually be “data item relation”, rather than “data relation”.

In full, what OWL is really describing are these:

Thing-to-thing relations
Thing-to-data-item relations

So, adding brevity and then following the same naming logic as before, you end up with these two concrete classes of relations:

Relations
Data item relations
chiarcos commented 5 years ago

Just my view but the ER model is still jargon. It's definitely something you can find text books and tooling for. If it's
jargon, it's well-documented, at least, and it used to be wide-spread use
even before RDF emerged. Even nowadays, it isn't dead, but continues in
UML object diagrams.

Entity - Do people use “entity” or “thing” in everyday speech? If people use "thing" in everyday speech, they normally mean "there is
something I cannot give a more specific name right now". Calling anything
"thing" doesn't mean it's well defined (in natural language), it actually
means the opposite. This is why we have "something", "anything" and
"nothing", cf. https://www.merriam-webster.com/dictionary/thing: "an
object or entity not precisely designated or capable of being designated".

"Entity" isn't used as often, but if so then often as a technical term
with a clear definition as "abstract concept". This is Merriam-Webster's
sense 2: "something that has separate and distinct existence and objective
or conceptual reality "
(https://www.merriam-webster.com/dictionary/entity). I would call this a
match.

Relationship - I agree that “relation” is preferable. +1

Attribute - I think it’s important to use the definitions of words as
found in the dictionary, and the definition of “attribute” in the
dictionary doesn’t describe >an OWL Datatype Property. A very established term in computation is "attribute-value-pair". This is
where the term comes from. And this corresponds exactly to
Merriam-Webster's sense 1 (and 3, for boolean values):
https://www.merriam-webster.com/dictionary/attribute.

Out of curiosity: How do you define "jargon"? If it's "the technical
terminology or characteristic idiom of a special activity or group", then
we should not reduce it, but rather make sure our jargon comes close to
the one of a group that is significantly larger than the RDF community
and includes potential users of the technology.

ER would be a candidate (with applications in RDB and software
engineering), others would be graph terminology (with application in NoSQL
DBs), or UML (object diagrams, with applications in OO programming). In
any case, they should not be mixed. And we should definitely not make
up yet another new terminology.

namedgraph commented 5 years ago

Whatever you come up with here, it's not gonna stick. Nor should it. Why don't you solve some real problems?

zacharywhitley commented 5 years ago

First, in response to the topic, hell no.

There's already a term called ref:resource and if that isn't a good enough reason it's called (R)resource because it's the same R as in U(R)L, U(R)I, and I(R)I. It's a web resource. If you want to make things easier for people new to the technology throwing everything in the trash and replacing it with vague terms no one agrees on is the exact opposite of what you should be doing. A better approach would be to point out the historic context and use simplified analogies with the caveat that they are just that, simplifications. Historic baggage is everywhere, there's a reason I can still spell it color as colour and that there are accents on résumé.

This repository is titled EasierRDF and people are coming across this because they were confused and did a google search for "can someone make rdf easier". What str they going to think when they see it? "Wow, a decade later and they're still arguing some technically correct but ultimately pointless details of what terms mean. I'm just going to stick with Elasticsearch, Postgresql, Phoenix, Hive, Presto, Druid, Domeo, Hawk, Impala, Influx, MySQL, ArrangoDB, Neo4j, etc."

There really isn't that much jargon but if you want to keep arguing this thread for another hundred years here's TBox, ABox, Term, Context, Model, Punning. HttpRange14, Ontological commitment, Open world reasoning, and unique name assumption.

None of this helps your middling 30%ers. If they don't understand the terminology they actually read something about it. Your middle 30% moved on a long time ago. They are Don Draper in an elevator saying, "I don't think about you at all."

anthonymoretti commented 5 years ago

The funny thing is I am that 30% developer that you describe. I didn’t understand the terminology and so I read something about it, and that took time that I’d like to try and save other developers from having to waste. I mean c’mon “predicates”? 😂

Look at this well written intro to OWL 2 that could describe RDF:

OWL 2 is “a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things.”

You could put that on a landing page it’s so simple. But replace the words in that sentence with current RDF terminology and it has very quickly lost its appeal.

I am “solving real problems” outside of this, Martynas 😂😂😂

amirouche commented 5 years ago

Getting started with RDF by considering that subjects are URIs and predicate some kind of URI and object can be some data types is very complicated. Considering that URIs are data types is strange.

IMO Datomic (which completly avoid the RDF vocabulary for some reason...) speak in term of Entity Attribute Value (which might be somewhat misleading).

I think that Identifier Key Value could be a good middle ground, it reuse existing software engineering vocabulary while being backward compatible with the original triple Subject Predicate Object.

zacharywhitley commented 5 years ago

Changing what you call something isn't going to help you understand it. What exactly is the problem with the word "predicate"? You don't like that you had to read something to understand it? What technology are you working with where you don't need specialized terminology or have to read to understand it? There's nothing about the word "broker" that helps me understand Kafka. Or how about "a monad is just a monoid in the category of endofunctors"? to which people published articles titled "A Moniod is a Burrito" and were then countered with "A Monad is a Burrito and other Functional Myths". If other communities can flourish with this kind of naval gazing then I don't think "predicate" is really the problem. NLP uses the work predicate, Apache Camel has predicates. Guava has a predicate as well as vavr.

If you'd like better tutorials, videos, instructional material that presents things more explicitly and helps you understand the concepts faster than let's do that but changing the terminology is not going to help anyone understand the concepts any better and will most likely make things more confusing.

zacharywhitley commented 5 years ago

@amirouche You're just replacing the names Subject -> Identifier, Predicate -> Key, and Object -> value with arguably inferior replacements. There is nothing key like in the predicate and the value can be another identifier as well as a literal and the Identifier identifies things as much as your key or possibly your value does.

If that mental mapping works for you then by all means use it. Datomic doesn't use RDF because it chose not to use that standard and is based on Datalog.(it actually uses a 5 tuple)

You think that Subject/Predicate/Object is too complicated, like Identifier/Key/Value, and think Entity/Attribute/Value is misleading?

draggett commented 5 years ago

@anthonymoretti says:

Relationship - I agree that “relation” is preferable.

Why? Isn't that just a matter of US vs GB usage of English?

HughGlaser commented 5 years ago

There's already a term called ref:resource and if that isn't a good enough reason it's called (R)resource because it's the same R as in U(R)L, U(R)I, and I(R)I. It's a web resource.

First: as I said, I don't think agreeing to use a term other than "Resource" is a Good Idea. However, we need to accept that it is problematic, and not pretend (to newbies, when we talk to them) that it isn't.

Because: No, it isn't "a web resource". That's the whole point. If it was, then fine. But if it was, then we would still be using URL, but we aren't, we have changed to URI, IRI, or whatever. So a URI doesn't Locate a Resource, because that would be a stupid thing to do for abstract things that are not on the Web. So, for example, saying that the URI for the Cowardly Lion's courage is identifying a Resource really stretches the natural idea of a resource rather far.

But this is one where we just have to suck it up and take the hit.

zacharywhitley commented 5 years ago

So what exactly is the problem?

dbooth-boston commented 5 years ago

Why don't you solve some real problems?

Please, let's keep the conversation civil and constructive. We should be welcoming the ideas of newcomers who can look at this stuff with fresh eyes -- not flinging insults at them. IMO newcomer perspectives are the most valuable of all, because newcomers represent the target demographic of this effort. Experienced RDF users are not.

Difficulty of use, and confusing off-putting jargon certainly are very real problems in RDF. And the whole purpose of this discussion is to collect ideas for addressing them. Nobody expects all ideas to be adopted. But we need to get fresh ideas on the table -- the more the better -- in order to eventually figure out which ones we might want to pursue. We cannot do that by creating a climate of intimidation and elitism.

The jargon barrier is one that I had forgotten, since it has been so long since I faced it myself. I am glad that it was brought to our attention.

anthonymoretti commented 5 years ago

Cheers David. That’s true, I’m definitely not expecting any or all ideas to be adopted, just want to put them out there.

Hugh explained it well, as far as I can tell URIs officially took on the broader meaning in 2004:

https://www.w3.org/TR/webarch/#id-resources

Even there, if you look at the third paragraph, the editors redefine “resource” to mean “thing”.

Guess I’m wondering why it would be so hard to do. Coming from iOS development I’m very familiar with deprecation, every year with each iOS release there are many deprecated APIs, it’s a fact of life for iOS developers, millions of us cope, and the frameworks improve over time. What’s different about the RDF ecosystem?

Dave, fair question. I’m no linguist, but I think it’s a subtlety between the words rather than US vs GB. If we take the Oxford dictionary, both “relation” and “relationship” start with the same definition:

“The way in which two or more concepts, objects, or people are connected.”

Then they differ slightly immediately after that:

Relation: “a thing's effect on or relevance to another.”

Relationship: “or the state of being connected.”

So, I could be wrong, but one seems more applicable to types of relationships, and the other to instances, and we’re after types.

zacharywhitley commented 5 years ago

“it’s a fact of life for iOS developers”

It’s the same reason the api has NS* everywhere in their API. That’s an abbreviation for Next Step.

anthonymoretti commented 5 years ago

Not since 2014 actually, when Swift was released, essentially deprecating a whole language, Objective-C.

Come to think of it, for iOS developers the size of that change was basically like learning that RDF was being deprecated. You won’t find many developers that wish to go back to Objective-C now.

zacharywhitley commented 5 years ago

Yes, that was my point. I’m glad to hear you agree.

anthonymoretti commented 5 years ago

Anyway, to summarize and put the ideas in one place, as a newcomer and just in my opinion I would find the following terminology more understandable than the current terminology:

Things
    Data items
    Relations
        Data item relations

I understand that for existing users it probably seems pointless, but if RDF is going to be around a long time, which I believe it will, then the number of people we could make it easier for is going to be far greater than the number of people currently using it.

I appreciate everybody here for even considering these ideas.

stain commented 5 years ago

We had this discussion in W3C PROV group, and our conclusion was that we needed "entities" to describe "things in the world". An entity is a particular view or expression able to 'capture' the concept of the thing so that it can be further described.

At it simplest form this can be through identity, secondary it might be by locking down some attributes (e.g. a person's name), thirdly it might just be relational to other pre-existing entities (E.g. "child of X").

I don't think it's very hard to explain the concept of RDF Resource - here's from the Commons RDF tutorial https://commons.apache.org/proper/commons-rdf/introduction.html#RDF_resources - important to keep in mind here is that literals are also a kind of resource.

What I think makes RDF special is that data relations (literal "attributes") are not distinct from entity-to-entity relations, hence the need for neutral names like "predicates" and "property" rather than "attribute" or "relation".

In earlier vocabularies like Dublin Core Elements, a property could be used both as a data attribute (dc:creator "Stian Soiland-Reyes") and resource relationship (dc:creator https://orcid.org/0000-0001-9842-9718`) and both were fine.

For a while more semantically strong ontology design principles such as with OWL meant a named property were seldom having such double-duty anymore, and distinguishing between "data items" (data properties in OWL) and "data item relations" (object properties in OWL) would work as a general simplification.

However, now schema.org has through a spanner in the works and gone back to basic, allowing double-duty like https://schema.org/correction that can be used with URL (external <reference>), CorrectionComment (in-line structured {object}) or Text (in-line "") - almost equivalent to IRI/bnode/literal in RDF.

So I don't think we can keep a strong distinction between data properties and object properties, but rather take on-board an important aspect that it should be clear from a vocabulary if it's meant to point to an external resource or have a resource description. Not knowing this in advance is one of the big barriers to RDF consumption and a big motivation for RDF Shapes.

anthonymoretti commented 5 years ago

Yeah, I probably should have left out the suggestion for data item relations because it's not part of RDF anyway. I don't quite see you on the point about predicate or property being more "neutral" than attribute or relation though.

And yeah, we could go back and forth about whether resource is easy to explain or not. It's just my view, but if somehow the conversation around easier RDF resulted in a new RDF where we have a blank slate then it'd be better to go with terms that already have the desired meaning, so thing or entity. Even the section in the PROV doc you link to starts with the sentence "things we want to describe the provenance of are called entities". It's like things is good enough to use in everyday language, but in technical or academic contexts it's not and a new term has to be used, and I don't see why. Obviously just my opinion.

HughGlaser commented 5 years ago

@stain > We had this discussion in W3C PROV group, and our conclusion was that we needed "entities" to describe "things in the world". An entity is a particular view or expression able to 'capture' the concept of the thing so that it can be further described.

This has some interesting stuff that seems to shed light on things for me. You say an "entity" is a "particular view or expression"? But I think URIs actually are the "concept of the thing" as you put it. So entity would be the wrong word - it isn't what we are after; and would not be consistent with PROV, (My guess for PROV is that entity is used deliberately to make it different from (RDF) Resource, as the intention is to be concerned only with things that do have some more concrete "presence" in the world - and thus entity is quite a good term to use. And resource would have been, but has already been taken by RDF [here the discussion went into infinite regression] :-) )

The RDF resources document you cite immediately uses words like anything and concept to try to explain what a resource is, and almost every post I see about this cannot avoid using the word thing. So if we were to make a change, shouldn't we use one of those? <ignore> I think anything is rather growing on me! Even better than thing, and clearly a technical term if you say "foo is an Anything". But no. On the other hand, maybe we could just go the whole hog the other way and use a Greek or maybe an unused Hebrew letter? </ignore>

I'm not necessarily in favour of moving from Resource - I think it is too well embedded - but I see no benefit to move to Entity. Both terms are too concrete to me: <<The cowardly lion's courage>> is no more an Entity than a Resource for me.