w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Allow arbitrary strings as predicates #106

Open HolgerKnublauch opened 7 months ago

HolgerKnublauch commented 7 months ago

THIS IS ME BRAINSTORMING ONLY, so don't kill me.

Currently, RDF requires predicates of a triple to be IRIs. I guess that choice was made so that

  1. ontologies can attach information such as rdfs:range and rdfs:label to the properties themselves
  2. it becomes more likely that predicates are scoped in the context of a namespace and thus don't clash with other namespaces, which means that it is likely that (SPARQL) queries against these properties only find the subjects that we want

But: ad 1: Global property axioms are not necessary and do not play a role in SHACL where everything is scoped by shapes and classes. And rdfs:range and rdfs:domain are typically horribly misunderstood. ad 2: Even with unique identifiers people from other graphs may reference your predicate in unexpected ways and your queries still need to filter by subjects.

Even within a single namespace it is quite common that the same URI is used for different purposes. For example, a ex:role property could point from an ex:Agent to a ex:Role or from an ex:Organization to a ex:Role, and both could have different local meanings depending on the context.

So the benefits of URIs as predicates are IMHO overrated.

Proposal: Moving forward, RDF could also allow predicates to be arbitrary strings.

a) That is how most map-based data structures like JSON objects or Python dictionaries operate, meaning that the mapping between RDF and other languages becomes easier. I think property graphs too.

b) Allowing strings would make the syntax more compact. For example one could write

ex:David firstName "David"

c) People don't need to invent artificially "unique" names - their application logic and queries are most likely already checking for the context anyway, e.g.

SELECT ?david
WHERE {
    ?person firstName "David" .
    ?person a ex:Person .
}

is already scoping the use of firstName to instances of Person, making the property uniquely identified at query time. And when mapped to languages like GraphQL or JavaScript, any access to predicates is already scoped to the context object.

As this would be an incremental generalization, existing RDF graphs would not be affected. People are not forced to use strings as predicates.

To minimize the overhead for existing triple stores, string-based predicates could be internally converted to URIs such as

urn:rdfpredicate:firstName

after parsing in Turtle or SPARQL. But in the far future, there could also be an RDF that uses no URIs as predicates, with all frequently used predicates mapped to shorter names. Turtle and SPARQL have already started going down this route by introducing 'a' as abbreviation for rdf:type. They could also add 'label' as alias for rdfs:label or 'superClass' as alias for rdfs:subClassOf.

Also note that schema.org and wikidata use the same namespace for all predicates, so basically it's the same as if no namespace exists in their worlds.

dbooth-boston commented 7 months ago

Interesting idea. I think there are two ways this could be interpreted: as globally scoped predicates that have minimal semantic commitment; or as some kind of locally scoped predicates. As shown, it looks like you are treating them as globally scoped.

If they are globally scoped, then inference rules that use them must qualify their intended scope, such as by indicating the class of the subject, as you describe.

If they are locally scoped, then we'll need ways to manipulate scopes, such as we have in programming languages. For example, when a library is imported it pulls a set of identifiers into the current scope, or allows an identifier from a foreign scope to be bound to an identifier in the current scope.

I wonder what other pros and cons there might be of treating them as globally scoped vs locally scoped.

amirouche commented 7 months ago

In my experience 'predicates as strings' is more approachable, and less interoperable.

HughGlaser commented 7 months ago

I think the issue is perhaps tied up with the question of literals as subjects too. That is issue https://github.com/w3c/EasierRDF/issues/21 Note that that issue discusses literals as predicates a bit too. Of course each of these steps takes RDF further away from Linked Data. Even so, personally I quite like such relaxation of the URI requirements.

HolgerKnublauch commented 7 months ago

Yes there is some overlap with #21. For many of the customers that we see, the concept of Linked Data is not relevant as they only operate on controlled enterprise graphs. Also I believe even from a Linked Data perspective, having simpler property names shouldn't be a problem because it is far more relevant that the subjects and objects are URIs than the predicate.

namedgraph commented 7 months ago

I don't understand how typing one less character (property instead of :property) can justify thousands of manhours of specification and implementation work this change would incur on the ecosystem.

amirouche commented 7 months ago

Did you consider a mini-rdf on top of what RDF can be implemented?

HolgerKnublauch commented 7 months ago

@namedgraph: I believe one goal of an EasierRDF project is to align better with what most software people are used to. Backward compatibility is desirable but by definition difficult to achieve forever.

In the case of allowing strings as predicates, there is at least one simple approach, namely to convert them into special URIs, allowing existing infrastructure to be re-used without issues -

With this approach, the only software changes would be to the Turtle and SPARQL parsers, to convert these strings into special URIs.

chiarcos commented 7 months ago

Another way of implementing this would be to automatically wrap every subject (or predicate) string into a blank node with that exact value. I guess this will be very much hated by most here, but it would have a valid RDF 1.1 interpretation.

Am Sa., 25. Nov. 2023 um 11:18 Uhr schrieb Holger Knublauch < @.***>:

@namedgraph https://github.com/namedgraph: I believe one goal of an EasierRDF project is to align better with what most software people are used to. Backward compatibility is desirable but by definition difficult to achieve forever.

In the case of allowing strings as predicates, there is at least one simple approach, namely to convert them into special URIs, allowing existing infrastructure to be re-used without issues - urn:rdfpredicate:firstName

With this approach, the only software changes would be to the Turtle and SPARQL parsers, to convert these strings into special URIs.

— Reply to this email directly, view it on GitHub https://github.com/w3c/EasierRDF/issues/106#issuecomment-1826273380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATZWSJMABT36NGMTCYFAGTYGHAZDAVCNFSM6AAAAAA7SSEMFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGI3TGMZYGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chiarcos commented 7 months ago

Am Sa., 25. Nov. 2023 um 12:53 Uhr schrieb Christian Chiarcos < @.***>:

Another way of implementing this would be to automatically wrap every subject (or predicate) string into a blank node with that exact value.

I meant "rdf:value" Message ID: @.***>

namedgraph commented 7 months ago

@HolgerKnublauch I disagree. IMO we should make RDF-based software so flexible and powerful (in ways that would be impossible with RDF) so that we can empower the non-software people to work with data in new ways. That is a much broader audience than "software people".

Trying to bring RDF to the general "software people" always ends up in attempts to dumb down RDF, because part of the RDF community seems to think that it's its job to accomodate while the "software people" can't be bothered to put in the effort and learn anything new.

HolgerKnublauch commented 7 months ago

@namedgraph For how many years has the RDF community already tried to convert everyone else, with little success. 20 years now? It remains a niche technology. Maybe success is just around the corner, maybe not.

People coming from other communities just find it alien and too complex. One of the particularly alien concepts is that properties have a global identity. This is basically unknown in any other language. Combine this with the unusual semantics of rdfs:range and rdfs:domain and you can understand why few people want to invest into understanding this stack.

You are saying it would dumb down RDF, but what is actually the value of global identifiers for properties, leaving aside what RDF Schema tried to do: using property definitions to infer the types of subjects and objects without explicitly requiring type triples. What else is there apart from that use case?

HolgerKnublauch commented 7 months ago

Hypothetical syntax that only uses strings in predicate position, while bringing in existing namespace-based predicates:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex: <http://example.org/ns#> .
@alias a: rdf:type .
@alias label: rdfs:label .

ex:JohnDoe
    a ex:Person ;
    label "John Doe" ;
    firstName "John" ;
    lastName "Doe" ;
    age 42 .
HughGlaser commented 7 months ago

A real aside, perhaps:...

I find it gently amusing that the example predicates being used are exactly the ones I would really need to look up (using Linked Data?) to see if I can find out what the author might mean. I wonder which of firstName and lastName might be a family name, for example?

HolgerKnublauch commented 7 months ago

@HughGlaser not aside at all but important. Like in OWL and SHACL, a property would have its meaning in the context of a class or shape. So in this case, you would look the properties up by following the rdf:type, here ex:Person. It's still linked, self-describing data. This is exactly like you would look up the meaning of fields in a (Java) object or the parameters of a function - you start at the surrounding entity.

dbooth-boston commented 7 months ago

Would all bare predicates then be implicitly scoped to the class of the subject on which they are used? If so, which class if the subject is in multiple classes?

HolgerKnublauch commented 7 months ago

@dbooth-boston The problem of type clashes already exists, for example if you have two rdf:types with two owl:Classes that carry owl:Restrictions with different owl:allValuesFrom on the same property. In SHACL this would mean that all constraints apply.

HughGlaser commented 7 months ago

I find myself wondering if it should not be better as: "a" ex:Person ; "label" "John Doe" ; "first name" "John" ; etc., since the idea is "strings as predicates". And indeed, there might be a difference between something that is explicitly an alias, and things that are not. So it could be useful that 'a' might be different from '"a"', if you can see what I mean with the different quotes. (I don't see much point in doing this if we just use typical camelCase strings without a ':' in front.) I would love to have: "Billy Bob Brockali" "was born in" "Ballingdon Bottom". "Ballingdon Bottom" "is located in" "Britain".

dbooth-boston commented 7 months ago

I don't understand how typing one less character (property instead of :property) can justify thousands of manhours of specification and implementation work this change would incur on the ecosystem.

  1. The impact on users is much more than a single character of extra typing. Using :property forces users to declare a namespace, which forces them to commit to a global URI. We've seen over many years of experience that this alone presents a barrier, because it forces them to go down the unproductive rabbit hole of figuring out what URI allocation strategy to use and -- depending on the strategy chosen -- where and how to host it. See https://github.com/w3c/EasierRDF/issues/12 .
  2. While this one simplification may not be enough to justify the cost of changing tools and standards, I view it as one potential ingredient in combination of simplifications that -- taken together -- may well be worth adopting.

Trying to bring RDF to the general "software people" always ends up in attempts to dumb down RDF, because part of the RDF community seems to think that it's its job to accommodate while the "software people" can't be bothered to put in the effort and learn anything new.

If the RDF community were thriving and growing you might have a valid point in blaming developer laziness. But given that RDF is clearly losing out to easier-to-use competitors, I don't buy that argument.

I think developers are rationally deciding that the effort required to "learn something new" with RDF is not worth the payoff, given the availability of easier "good enough" alternatives, even if the RDF approach may seem more appealing in a theoretical sense.

The goal here is to make RDF -- or a successor built on RDF -- significantly easier to use, while retaining RDF's benefits and as much of the tooling and standards as possible.

TallTed commented 7 months ago

Requiring some things — e.g., RDF Subjects and Predicates — always be HTTP/S URIs means that those HTTP/S URIs can be treated as superkeys, which reach across DBMS schema, because they always denote the same thing. This is what delivers the Linked Data magic, and comprises the Giant Global Graph of our Semantic Webs (yes, intentionally plural). (Concerns like temporality do mean that Named Graphs or similar must be brought to bear, but this is handled with another batch of URIs, not arbitrary strings.)

Letting RDF Subjects and Predicates be arbitrary strings would turn RDF into yet another semantically unjoinable mishmash of schemata, and, if merged without great care, could render the current bunch of Semantic Webs a giant global mudpuddle of incoherency.


As to RDF's "failure" because it hasn't replaced tabular relational DBMS (a/k/a SQL) nor labeled property graphs — "horses for courses" comes to mind.

RDF is VERY well suited to data where the overall data structure is not known at project start, where the "schema" will evolve over time — e.g., "schema last" — and where data is sparse, i.e., where the values of some predicates/attributes may not be known for any given subject/entity but you still want to collect all those values that are known.

Tabular relational DBMS and their relational integrity and other restrictions makes them VERY well suited to dense data, i.e., where the values of all predicates/attributes for any given subject/entity are known, and you only want to collect the values of any given predicate/attribute when they are known for all subjects/entities.

Changing a tabular relational DBMS schema once deployed can be a HUGE undertaking, and may require updates to all tools in use against that schema. On the other hand, adding a property/attribute to an RDF graph or data set is typically a trivial undertaking, and tools which operate against that data do not typically require updates specific to the new attribute/property.


The idea of the "special treatment of arbitrary strings in subject or predicate position", coercing them into URIs, has some potential for implementability, though it doesn't solve the problem of "local only definition". I cannot dererefence your freshly minted URI, so I cannot confirm whether your intended meaning matches mine. This is, to me, a non-starter, overall.

TallTed commented 7 months ago

[@HughGlaser] I would love to have: "Billy Bob Brockali" "was born in" "Ballingdon Bottom". "Ballingdon Bottom" "is located in" "Britain".

I think you just want more mature tools, that will show you labels instead of raw URIs, while the URIs are in place behind the screen (a/k/a under the covers).

"Billy Bob Brockali" "was born in" "Ballingdon Bottom" . "Ballingdon Bottom" "is located in" "Britain" .

namedgraph commented 7 months ago

@dbooth-boston why do you see it as a competition? RDF will always lose out in the marketing sense because Neo4J alone has received $500M+ in VC funding.

Just stop trying to convert developers to RDF or use mainstream adoption as the success criteria. Many of their problems that do not require data interchange might simply be solved with JSON, or with property graphs for that matter.

The premise of Semantic Web was to deliver a new generation of the Web that is smarter, more automated etc. We haven't really seen that yet, and that's not the fault of the RDF model but of the software development still using legacy architectures. Why not focus our efforts on software that exploits RDF to the max and delivers something previously impossible? We have barely scratched the surface yet.

afs commented 7 months ago

Domain specific languages would help to make data writing easier int he sense of being more natural to the domain (SHACL-driven?). They could "compile" to Turtle/N-triples/JSON-LD with little more than guided text processing.

fekaputra commented 7 months ago

I like this idea of having the possibility to register additional shortcuts in addition to the “a” (rdf:type) - or even further to add the possibility to import it from another file (like json-ld context). This way, I can write my turtle file faster and still compliant to the current namespace-based predicates.

As a general comment, I think having the possibility to describe properties (i.e., property as first-class citizen) is one of the main uniqueness of RDF, and I would prefer to keep it that way. However, having the option of adding syntactic sugar like Holger proposed will be really nice, and hopefully can attract more people to use RDF.

On 25.11.2023, at 14:47, Holger Knublauch @.***> wrote:



Hypothetical syntax that only uses strings in predicate position, while bringing in existing namespace-based predicates:

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix ex: http://example.org/ns# . @alias a: rdf:type . @alias label: rdfs:label .

ex:JohnDoe a ex:Person ; label "John Doe" ; firstName "John" ; lastName "Doe" ; age 42 .

— Reply to this email directly, view it on GitHubhttps://github.com/w3c/EasierRDF/issues/106#issuecomment-1826323928, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAPAY42WK6UWBX5N5HDIMALYGHZDLAVCNFSM6AAAAAA7SSEMFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGMZDGOJSHA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

HughGlaser commented 7 months ago

Some comments:

1) I point out again that firstName "John" ; is not a string as predicate; it is some sort of RDF symbol as predicate. To be string as predicate, it would be be more sensibly rendered as "firstName" "John" ; It seems to be that the discussion centres around the first of those (which means that the thread title is misleading me).

2) If you think (à la Semantic Web) that the purpose of a URI is as an identifier, then the differences between the various forms that have been mentioned (:firstName, firstName, "firstName" and blank nodes etc.) become syntactic sugar that can easily be handled by preprocessors or equivalent making a URI.

However, if you think (à la Linked Data) that the use of a URI means that the consumer expects to be able to resolve it using http(s), then the difference between those forms becomes quite stark - the publisher needs to have access to an appropriate system etc., if the notation implies a URI.

So I suspect that SemWeb people see little point in the discussion, but LD people think it worth engaging with.

3) With respect to the sub-discussion of literals in the subject position. I find the asymmetry of RDF annoying, makes me represent things in unnatural ways for specific applications, and embarrassing to explain to newcomers. And I understand that it is unnecessary.

Cheers Hugh

On 26 Nov 2023, at 06:07, Ted Thibodeau Jr @.***> wrote:

I would love to have: "Billy Bob Brockali" "was born in" "Ballingdon Bottom". "Ballingdon Bottom" "is located in" "Britain". I think you just want more mature tools, that will show you labels instead of raw URIs, while the URIs are in place behind the screen (a/k/a under the covers). "Billy Bob Brockali" "was born in" "Ballingdon Bottom" . "Ballingdon Bottom" "is located in" "Britain" . — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

-- Hugh Glaser CEO Seme4 Limited Mobile: +44 7595 334155

@.*** www.seme4.com

HolgerKnublauch commented 7 months ago

On firstName vs "firstName, note that JavaScript allows both forms equivalently, assuming the string is a valid identifier.

let obj = {
    firstName: "Hugh",
    "lastName": "Glaser"
}

On the general topic, it is rather obvious that the W3C processes will not allow making such changes because by now there are too many established users and vendors who will expect predicates to continue to be (potentially resolvable) URIs. So any discussion here is rather academic, as input for a future WG that is independent of RDF as we know it. Maybe if we frame these topics accordingly, it will raise fewer concerns by those who will want to preserve the status quo.

namedgraph commented 7 months ago

@dbooth-boston you should enable Discussions :)

dbooth-boston commented 7 months ago

@dbooth-boston you should enable Discussions :)

Done: https://github.com/w3c/EasierRDF/discussions/107

TallTed commented 7 months ago

Please be aware that GitHub's "Discussions" are more of a Q&A that appears to have been modeled after the StackOverflow family of sites, than they are a discussion space which calls for threading message trees along the lines of what was once NetNews/Usenet/NNTP ... so what you intended to do may not be doable there, @namedgraph.

redmer commented 6 months ago

Note that quasi aliasing can already be done with prefixes, albeit uncommon and not whilst re-using the prefix.

JSON-LD of course also allows mapping JSON keys (aliases) to other URLs.

Re-using @fekaputra's example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/ns#>

# aliases as prefixes
PREFIX label: <http://www.w3.org/2000/01/rdf-schema#label> 
PREFIX firstName: <http://example.org/ns#firstName>
PREFIX lastName: <http://example.org/ns#lastName>
PREFIX age: <http://example.org/ns#age>

ex:JohnDoe
    a ex:Person ;
    label: "John Doe" ;
    firstName: "John" ;
    lastName: "Doe" ;
    age: 42 .
ex:JohnDoe
    a ex:Person ;
-    label "John Doe" ;
+    label: "John Doe" ;
-    firstName "John" ;
+    firstName: "John" ;
-    lastName "Doe" ;
+    lastName: "Doe" ;
-    age 42 .
+    age: 42 .
amirouche commented 5 months ago

Oh, sorry, I misread the topic. It should be best to update to topic to mention: byte strings. Otherwise, we will write past each other.