w3c / N3

W3C's Notation 3 (N3) Community Group
48 stars 18 forks source link

New shorthands for blank nodes and graph terms #214

Open sulivanShu opened 5 months ago

sulivanShu commented 5 months ago

Resources are concepts that can be broken down into subject, predicate and object.

Subject is the agent, predicate is the manifestation, and object is the patient of concept.

So the statement:

@prefix : <https://example.org/> .

:employ :employ :employ .

can be translated as: “Employers employ employees.”

However, in RDF, subjects, predicates, and objects can serve as subjects, predicates, and objects, with any order.

Thus, the declarations:

{[][]:employ}:work_with{:employ[][]}.
{[]:employ[]}:is_the_work_of{:employ[][]}.

read: "employees work with employers", and "employing is the work of employers".

The notation can be simplified to:

[[]:employ]:work_with{:employ[][]}.
[:employ[]]:is_the_work_of{:employ[][]}.

But it remains heavy and will quickly become unreadable if many resources are nested within each other.

Proposal 1

Introduce the following Shorthands:

Shorthand IRI
s <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>
p <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate>
o <http://www.w3.org/1999/02/22-rdf-syntax-ns#object>

So the syntax {:employ[][]} can be replaced by {:employ a s}

Proposal 2

Introduce the following syntax :a~:b~:c for {:a :b :c}.

Thus, following the two proposals, the syntax:

{[][]:employ}:work_with{:employ[][]}.
{[]:employ[]}:is_the_work_of{:employ[][]}.

can be replaced by:

:employ~a~o :work_with :employ~a~s.
:employ~a~p :is_the_work_of :employ~a~s.

Interest

This syntax is inspired by synthetic languages which include all Indo-European languages (English, Spanish, Hindi, etc.) and all agglutinative languages (Tamil, Indonesian, Turkish, Finnish, Korean, Japanese, Swahili, Quechua, etc.) It is therefore intuitive for all these speakers.

This syntax still remains analytical, since the shortcuts "o", "p" and "s" are abbreviations. It is therefore intuitive to a computer scientist or a Chinese person.

sulivanShu commented 5 months ago

On second thought, a more generic solution would be to allow the user to define their own syntax or semantic extensions using instructions named @syntax or @semantic, and which take as input an IRI pointing to resources in EBNF or N3 format.

A syntax extension might look like this:

formula     ::=     subject "~" verb "~" object

And a semantic extension like this:

_:1~_:2~_:3 a {_:1 _:2 _:3} .

This implies that shorthands can be used as subject or object, and therefore section 4.6 of the standard should be revised.

For example:

s a <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> .
p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> .
o a <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> .

By allowing users to define their own extensions, you avoid making N3 itself more complex.

doerthe commented 4 months ago

I normally like discussing syntactic extensions, but I would try to keep the number of such extensions limited. Therefore my question: do you have a particular use case in mind where you plan to use a lot of these triples containing many blank nodes?

sulivanShu commented 4 months ago

Translate natural languages more directly into RDF.

Create Controlled natural languages, thesaurus and dictionaries in RDF.

This could be of interest for the processing of natural languages by AI or for automatic translators.

RDF exposes the logical structure of the sentences, which is interesting for technical or scientific applications.

But maybe this isn't a trivial extension, and would require more thought.

I'm not an expert at all.

I came to these considerations because I find that the use of natural languages often lacks clarity.

Note that this type of criticism is not new, and that it had already been raised by George Orwell in "Politics and the English Language".

Is it possible to translate natural language into RDF? And if so, what language should I use? I tried using N3, and while it's possible, it's rather complicated.

Maybe N3 is not appropriate, maybe we need to invent a new language.

To easily translate natural languages, N3 lacks at least two features:

The ability to use triples in any position: subject, verb and object.

The possibility of defining your own shortcuts (like "a" for http://www.w3.org/1999/02/22-rdf-syntax-ns#type) which would allow you to import the vocabulary of your own thesaurus, in practice the vocabulary of a controlled natural language.

Without much rigor, this is what it could look like:

The syntax:

triple ::= ( "{" ( resource {2} ) + resource "}" ) 
| ( "[" resource + resource "]" ) 
| ( "(" resource + resource ")" )
resource ::= abbr | iri | triple | "[]" | "." | "<-" resource

The grammar:

{ { _:1 _:2 _:3 } a { _:1 _:2 _:3 } } .
{ [ _:1 _:2 ] a { [] _:2 _:3 } } .
{ ( _:1 _:2 ) a { _:1  . _:2 } } .

{ { _:1 _:2 _:3 _:4 _:5 } a { _:1 _:2 { _:3 _:4 _:5 } } } .
{ [ _:1 _:2 _:3 ] a [ _:1 [ _:2 _:3 ] ] } .
{ ( _:1 _:2 _:3 ) a { _:1  . { _:2 . _:3 } } }

An example:

"The philosopher Socrates was during antiquity a man respected by his peers"

{
    { Socrates a philosoph } # subject
    [ a antic ] # verb
    {
        [ a man ]
        a
        [ 
            <- respect
            [ 
                a (
                    [ pluri peer ]
                    [ rel Socrates ]
                ) 
            ]
        ]
    }  # object
} .

{ { Socrates a philosoph } [ a antic ] { [ a man ] a [ <- respect [ a ( [ pluri peer ] [ rel Socrates ] ) ] ] } }

triplets can be used as subject, verb or object, including anonymous entities. For example, { [] [] [] } means "an entity is something", which can help construct sentences like { [] a-antic [] }: "an entity was during antiquity something ".

sulivanShu commented 4 months ago

The more I think about it, the more I think I need a new language better suited to transcribing natural languages.

  1. The basis is the same as for any other RDF language: the triple:
a b c
  1. However, each resource can be assigned a subordinate, using the ";" character:
a;b;c d;e;f g;h;i .
a b c .
d e f .
g h i .
a d g
  1. The same resource can have several subordinates:
a;b;c;d;e f g .
a b c .
a d e .
a f g
  1. A subordinate resource can have a subordinate, through the characters ";;", ";;;", ";;;;" and so on, depending on their hierarchical level:
a;b;c;;d;;;e;;;f;;g h i .
a b c .
c d g .
d e f .
a h i
  1. when the verb of several subordinate clauses is the same, the verb can be factorized using a sub-subordinate and the logical operator AND:
a;b;c;b;d e f .
a;b;c;;and;;d e f
  1. the introduction of subordinate clause ";;and;;" can be abbreviated by the string ",,", the number of commas corresponding to the hierarchical level:
a;b;c;;and;;d e f .
a;b;c,,d e f
  1. Resources have a neutral element, denoted "_", which represents any entity.

Used as a subject it means: "an entity", as a verb: "is" and as an object: "something".

When an indeterminate entity is a determinate resource, this means that this indeterminate entity is this determinate resource (tautology):

_;_;a b c .
a b c
  1. This tautology makes it possible to simplify certain constructions:
_;_;a,,b c d .
a,b c d
  1. The subordinate clauses "," are distributive:
a,b c,d e,f .
a c e .
a c f .
a d e .
a d f .
b c e .
b c f .
b d e .
b d f
  1. the character "." representing the logical operator "and", it should not be placed, in principle, at the end of the last triple of the document, because the user might wish to join several documents using an operator other than "and" .

However, this rule is questionable as the "and" operator at the end of the document makes it trivial to join documents into one.

A finite series of triples could also be terminated by a special triple, like _ _ _ (without a dot), which allows one to manipulate documents that are mathematically finite, and not potentially infinite as the final "and" operator suggests.

  1. the use of a series of characters ";" or "," allows, with line breaks, to immediately identify the hierarchical level of subordinate clauses in complex sentences:
a ;b ;;c ;;d ,,,e ,,,f ,,,g ;;;;h ;;;;i ;;;;j ;;;;k ,,,l ;;m ;;n ,,o ,,p ,q ;;r ;;s ;t u v .
a
;b
;;c
;;d
,,,e
,,,f
,,,g
;;;;h
;;;;i
;;;;j
;;;;k
,,,l
;;m
;;n
,,o
,,p
,q
;;r
;;s
;t
u
v

The structure of a convoluted sentence can thus be transcribed relatively clearly and concisely.

  1. As stated previously, there should be an import mechanism to use resources more directly, without IRI and without ":" characters, paying attention to there are no collisions between the different resources used.

  2. Imported resources should be automatically instantiated, that is, a resource "a" does not correspond directly to the resource "ex:a", but rather to an instance of "ex:a": a _ ex:a.

Indeed, when in natural language I say that a dog has crossed the road, I do not mean that the category "dog" has crossed the road, but that an instance of the category "dog" has crossed the road.

Except in very specific applications (writing dictionaries, etc.), natural language almost always manipulates instances of concepts, and not the concepts themselves.

The import and instantiation mechanisms should certainly be formalized with more rigor.

  1. The rules of N3 which do not conflict with the previous rules apply.

In summary, the example sentence would be translated as follows:

"The philosopher Socrates was during antiquity a man respected by his peers"
_:1;_;Socrates,philosoph;;_;;subject antic man;<-respect;_;;pluri;;peer;;;<-rel;;;_:1 .
_:1;_;Socrates,philosoph;;_;;s antic man;<-respect;_;;pluri;;peer;;;<-rel;;;_:1

In _:1;_;Socrates,philosoph, Socrates and philosoph being in third position, they are a priori objects, that is to say that philosoph means philosophy and not philosopher.

To correct this irregularity, we also declare philosoph;;_;;s, or to abbreviate: philosoph;;_;;subject.

Thus, we are sure to speak of “Socrates, the philosopher” and not of “Socrates, the philosophy”.

The description of the language certainly still lacks rigor, and could even contain errors.

domel commented 4 months ago

It seems to me that the proposed change is so far-reaching that it should be pushed to the level of Turtle. N3 largely has a similar grammar to Turtle, if such large changes were implemented in Turtle they could certainly also be considered in N3 but not the other way around.