w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Idea: Higher-level RDF language #34

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

"Using RDF is like programming in assembly language. It is tedious, frustrating and error prone. Somehow, we need to move up to a higher, easier, more productive level." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0036.html

"RDF is deceptively simple. You start with a simple idea and end up with a complex mess." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0094.html

"it may be that it is not so much easier RDF that is needed as time working out how these paradigms -- [RDF's logical , FP's algebraic and OOs state based architecture] -- fit together, as we need people from these paradigms to work together" https://lists.w3.org/Archives/Public/semantic-web/2018Dec/0006.html

Ideas for addressing this issue

Higher-level RDF language

"What I'd most like to see is a higher-level RDF language that gets compiled into triples/quads, just as python gets compiled into byte code, such that RDF users never need to actually see or deal with the underlying triples." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0082.html

IDEA: RDF templates

"While you wait for the easier RDF, please try out Reasonable Ontology Templates (OTTR). OTTR templates are like macros for RDF." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0091.html

"On my wish list are . . . specific templates for certain types like addresses" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0170.html

william-vw commented 5 years ago

Thanks very much - I wasn't aware of the OTTR effort!

So, the first idea would cover any "semantic extension" that can nevertheless be translated into triples/quads (e.g, by an accompanying transformation to "ordinary" RDF graphs and SPARQL queries)? Taking Hartig et al. as an example, this could perhaps also be a way to "hide" the user-unfriendliness of certain RDF concepts, such as blank nodes (e.g., during reification), lists, .., at least until they get resolved in a different way (?)

Maybe it could be worthwhile to look into an RDF/SPARQL extension mechanism to facilitate such efforts?

draggett commented 5 years ago

See also issue #51 (Relationship to the Web of Things) and #45 (Property Graphs). These allow for n-ary terms and the potential for alignment on graph query languages.

draggett commented 5 years ago

I have been brain storming about some ideas towards a proposal for a higher-level RDF language with a view to simplifying applications, and acting as an interchange framework between graph databases. It seems appropriate to explore a broad range of such ideas and evaluate what makes the best sense in terms of ease of use and generality in respect to embracing existing graph databases. The following focuses on the requirements and ignores the syntax.

A starting point is to consider nodes as objects with named properties. Property values may be other objects or literals such as booleans, numbers and strings. Objects thus form graphs. A collection of objects can be given a name and treated as an object that can be opened to reveal the collection. Property names are scoped to the object they apply to. A graph can define local names for objects, i.e. names scoped to that graph.

Properties are triples with a "subject" (the object), a "predicate", (the property name) and an "object" (the property value). We may want to distinguish regular properties from those that are considered to be object metadata. JSON-LD's context provide a precedent for mapping local names to RDF's URIs. A path traversing the graph can be expressed in terms of a sequence of property names.

Applications can be defined in terms of rules that operate on graphs. The following focuses on forward chaining of production rules with the form: if condition then action. Such rules could be used as queries to return a structured result to a graph query. Production rules could also be used to update graphs, or to invoke actuators, e.g. to change the state of the real world such as switching on a light.

Rule conditions form logical expressions over terms. Named variables can be used as constraints across conditions, and also to feed values matched by conditions into rule actions. Conditions may involve operators for booleans, numbers and strings, e.g. to compare a numeric value to a literal number. It would be interesting to explore the potential for applying unification as a powerful framework for instantiating variables in rule conditions.

Rule actions can construct structured results that include values obtained from the conditions. In principle, if the conditions have multiple matches, then the rule will have multiple results. What about when multiple rules match the current state of the graphs? A common approach is to define some kind of ordering between competing rules, and either to just execute the highest priority rule, or to execute them all in priority order.

A related question is how to deal with a rule that has side effects when the conditions have multiple matches. One approach is re-evaluate the rule's conditions after each match and corresponding execution of the rule's actions. To obtain a predictable result, this requires some means for ordering the matches.

When constructing complex results, it may be desirable to define rules whose conditions act over the results generated by previously executed rules. Complex queries may thus involve the creation of dynamic graphs.

Rule conditions and actions could involve external functions that are supplied by the application or are predefined for common use cases. When used in conditions, such functions should be side-effect free.

We also need a means to annotate properties, e.g. to state provenance, trust, temporal and spatial metadata and so forth. As more concrete example, let's assume that John Smith has worked for a succession of employers over his career. We could express this using a property "employedBy" with multiple values that are annotated with the start and stop times for that period of employment.

How can we express this in a simple way for data and for rules? This needs to avoid the potential for confusion with the employer's properties given that the period of employment is associated with the "employedBy" relationship holding between employee and employer, and not the employer alone.

One approach is to state each property separately and to introduce local identifiers (blank nodes) that relate the different statements. The experience with RDF is that this is awkward for people to work with. The Turtle notation provides some syntactic convenience, e.g. with curly braces for anonymous identifiers, but surely we can do much better than that with a higher level framework, right?

Further brainstorming is needed to provide some ideas for evaluation. In addition to working from a blank sheet, it is also informative to look at how others have addressed this and other related requirements. My next step is to look at such solutions and see what questions arise. This should include work on use cases as a means to make it easier to compare different approaches.

aabs commented 4 years ago

How different would a higher level RDF language be, conceptually, from a language like Prolog? i.e. with the aim of capturing knowledge and the rules it obeys and answering questions about the knowledge.

An embedding like miniKanren to existing languages, that are already widely adopted, might reach developers faster than asking them to switch stacks.

dbooth-boston commented 3 months ago

Interesting proposal for a simplified JSON-LD profile: https://lists.w3.org/Archives/Public/public-linked-json/2024Mar/0004.html

It doesn't directly address the idea of a higher-level RDF language, but it seems like a very useful step in thinking about "what could be".

william-vw commented 3 months ago

@draggett In the meantime you have undoubtedly become aware of N3 (as you congratulated us on the spec :-)). For completeness, and others posters, I will point out the following:

A collection of objects can be given a name and treated as an object that can be opened to reveal the collection.

One can use graph terms to create groups of triples. There are ways to associate names (and semantics) with graph terms.

Property names are scoped to the object they apply to. A graph can define local names for objects, i.e. names scoped to that graph.

Blank node scoping is local to a graph term.

A path traversing the graph can be expressed in terms of a sequence of property names.

Checkout resource paths!

Regarding the part on rules, conditions, unification, builtin functions, and chaining, I believe N3 checks all those boxes.

We also need a means to annotate properties, e.g. to state provenance, trust, temporal and spatial metadata and so forth. We could express this using a property "employedBy" with multiple values that are annotated with the start and stop times for that period of employment.

There are multiple patterns to do this in N3.

In addition to working from a blank sheet, it is also informative to look at how others have addressed this and other related requirements.

At least this, if nothing else!