w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
121 stars 19 forks source link

The unnamed/default graph should have a standard name #43

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

At present the unnamed/default graph has no standard name. This means that, when writing code that manipulates graphs, one must special-case the unnamed/default graph. It also violates one of the Axioms of Web Architecture: "Any resource of significance should be given a URI."

I think the unnamed/default graph should have a standard name, such as http://www.w3.org/1999/02/22-rdf-syntax-ns#defaultGraph ( rdf:defaultGraph ). Implied references to the unnamed/default graph in SPARQL, TriG, etc., should be understood as short-hand for this graph name.

kasei commented 5 years ago

Does this imply that you think disparate endpoints would be explicitly using the same default graph? That sounds rather strange to me.

One of the underlying setups we attempted to support in the SPARQL 1.1 WG was systems which had an underlying quadstore (where every graph has a name) in which the query engine would use one specific named graph as the default. I've designed systems like this in the past, and been very happy with it. I think an alternative approach to this issue (defined by a future version of SPARQL) might be to reconsider the default graph as a pre-defining of the active graph (as if the query were wrapped in a GRAPH <g> { ... } block). The specific graph being used could be indicated in the service description, allowing it to be referenced explicitly.

dbooth-boston commented 5 years ago

Does this imply that you think disparate endpoints would be explicitly using the same default graph?

It would be the same as when different SPARQL endpoints use as a named graph name. A query would not magically cause all SPARQL endpoints to return results from all SPARQL endpoints that use that graph name.

kasei commented 5 years ago

No, but it might cause some surprising results if you start dealing with metadata or provenance data about graphs (where statements made about another endpoint's default graph now apply to all default graphs everywhere).

kasei commented 5 years ago

Alternatively, for endpoints that are using the SPARQL Protocol, a default graph IRI could be constructed based on the service endpoint URL.

rnavarropiris commented 5 years ago

As the Dedicated Unnamed Default Graph is not referenceable, it is not possible to join it with other graphs. In other words, if a query specifies any other graph (whether using FROM or FROM NAMED) the default dataset of the service will be overwritten (13.2 Specifying RDF Datasets) and therefore the Dedicated Unnamed Default Graph is not accessible anymore.

by @depressiveRobot, article here

I complete agree with @dbooth-boston, the default graph should be referenceable, so that it could also be used in a dataset definition.

Furthermore, the query dataset could then be defined as the union default graph, since otherwise there would be no way to retrieve the list of existing graphs in the quad store with a query e.g. in the form

SELECT DISTINCT ?g { GRAPH ?g {?s ?p ?o}}
cygri commented 5 years ago

One option here is to follow SPARQL Update and support:

FROM DEFAULT

and

GRAPH DEFAULT { ... }

It solves some of the problems/inconveniences (inability to access original default graph if dataset is specified with FROM / FROM NAMED; inability to switch back to default graph inside a GRAPH clause), but does not solve others (listing all graphs in the dataset).

dydra commented 5 years ago

No, but it might cause some surprising results if you start dealing with metadata or provenance data about graphs (where statements made about another endpoint's default graph now apply to all default graphs everywhere).

that expectation could be framed by choice of iri form. alternative to a keyword, such as "DEFAULT", one could use a urn or a default indirect graph identifier.

afs commented 5 years ago

Removing "SPARQL: " on transferred issue.

jaw111 commented 5 years ago

As well as supporting DEFAULT as an 'alias', it'd be great to see NAMED and ALL supported in the FROM clause as a way to address union of all named graphs and the default plus all named graphs. Currently those typically have a 'special' name (URI) that is implementation specific.

Examples:

SELECT *
FROM ALL
WHERE { ?s ?p ?o }
SELECT *
FROM ALL NAMED # to avoid confusion with FROM NAMED <uri> syntax
WHERE { ?s ?p ?o }

Edit opened #59 for this topic as it is a separate (but related) issue

jindrichmynarz commented 5 years ago

RDF4J has a special constant sesame:NIL that refers to null named graph. Would be nice to have a more standard identifier.

cygri commented 5 years ago

A slightly cheeky option would be to use <about:default-graph> as the IRI of the default graph. Note that's not a prefixed name; it's an IRI using the about: scheme. The RFC defining the scheme states:

This document describes the "about" URI scheme, which is widely used by Web browsers and some other applications to designate access to their internal resources, such as settings, application information, hidden built-in functionality, and so on.

That seems close enough to cover the case of the default graph. Currently, the scheme is used in web browsers as URL for special pages like about:blank and about:config. There is a an IANA registry for the blank/config part, but browser vendors generally don't seem to bother with registration.

lisp commented 5 years ago

a concern about using a registered scheme is that there would be a temptation to use it as the name for a concrete graph.

cygri commented 5 years ago

@lisp I don't understand what point you are trying to make.

lisp commented 5 years ago

how would it work out if a quad import were to include the following,

<http://example.org/s>  <http://example.org/p> "o" <about:all-graphs> .

?

cygri commented 5 years ago

@lisp I proposed the IRI <about:default-graph> as a name for the default graph. I don't see how your question is related to that proposal.

kasei commented 5 years ago

@lisp I proposed the IRI <about:default-graph> as a name for the default graph. I don't see how your question is related to that proposal.

I think the concern here is what and endpoint should do if about:default-graph was found as a graph name in real world data. Should it just be hidden by the endpoint's own use of that graph name as special? If an existing quad store had such a graph name, could a SPARQL Update processor do anything with the actual named graph as opposed to the default graph it (also) referenced?

This might not be a problem for systems that have a single (named) graph that is identified internally as the default graph for SPARQL purposes, but SPARQL also supports systems where the default graph isn't just a normal graph internally. For example, the default graph can also act as a union of some or all named graphs. Hard to see what would be the correct behavior for these systems if there's a collisions between a graph name and a special name used to identify the default graph.

cygri commented 5 years ago

I think the concern here is what and endpoint should do if about:default-graph was found as a graph name in real world data.

You mean loading an N-Quads file that contains triples in a graph named <about:default-graph>? That IRI names the default graph. So the triples should go into the default graph.

If an existing quad store had such a graph name,

You mean someone used <about:default-graph> as a graph name in their SPARQL 1.1 graph store? Well, in that case, the vendor will receive a support request from a very confused customer who just upgraded their graph store software and now the data from one of their graphs is gone. The response from the vendor will be that the customer should have known better than to use that graph name. And also that they should have read the upgrade instructions where it was clearly mentioned that any existing graph named <about:default-graph> must be removed before upgrading.

SPARQL also supports systems where the default graph isn't just a normal graph internally. For example, the default graph can also act as a union of some or all named graphs.

Such systems should treat the named graph <about:default-graph> exactly like they currently treat the default graph.

lisp commented 5 years ago

Such systems should treat the named graph exactly like they currently treat the default graph.

the broader proposal is that there be some standard syntactical elements which designate all three of the distinguished cases. the example with a term analogous to the proposed "default" term demonstrates the problem(s) which would ensue from using an otherwise legitimate iri.

we do now something which is analogous to the proposed iri. it is not a good idea. that due to the situation described, above. it is done now, exactly because it requires no change to sparql syntax. given the latitude to consider alternatives, one which is less likely to confuse is much to be recommended.

cygri commented 5 years ago

@lisp You have an interesting way of expressing yourself. Would you please humour me and say that again in simple English?

jindrichmynarz commented 5 years ago

Since any IRI can be used to identify a named graph, SPARQL 1.2 would have to decide what to do when IRIs reserved for default graph or union graph are found in user data, such as when loading quads containing the reserved IRIs. The IRI reserved for default graph can be effectively ignored, but the union graph IRI (e.g., <about:all-graphs>) doesn't have a straightforward interpretation.

These decisions can be avoided if default graph and union graph are not identified via IRIs but via dedicated keywords, such as in SPARQL 1.1 Update. The cost of this approach is breaking changes to SPARQL syntax.

cygri commented 5 years ago

Thank you, @jindrichmynarz. Note that there was no suggestion to introduce IRIs for anything but the default graph. Pointing out problems with introducing an IRI for something else doesn't demonstrate problems with introducing an IRI for the default graph.

Allowing the DEFAULT keyword in more places is a partial solution. For example, it would allow “switching back” to the default graph deep in a nested query, which is currently impossible:

GRAPH ?g { ... GRAPH DEFAULT { ... } ... }

However, it doesn't solve other aspects. For example, take a parameterised query (see #57) where the target graph is supposed to be a parameter. This currently requires elaborate special casing in the query to support the default graph as target, and GRAPH DEFAULT doesn't help.

afs commented 5 years ago

For me, the baseline choice is keywords; other proposals have to offer some advantage overall.

I prefer using keywords because of the issues around use of URIs, not just in quads but also they aren't naming the same graph across datasets. Taking a special prefix name is also a possibility (and it isn't defined by PREFIX; if it is, it isn't special) but that looks more like an unusual way to write a keyword.

I can see wanting to say "the default graph is " (the URI it actually is, not a placeholder) but that does not make all default graphs that .

Nearby: default-graph-uri in the protocol.

cygri commented 5 years ago

Using IRIs to refer to local resources is fine, and is done all the time—<file://...>, <http://localhost...>, <about:config> in web browsers.

The purpose of <about:default-graph> is to allow query writers to refer to the local default graph. Why is it a problem that it refers to default graphs with different contents in different datasets?

afs commented 5 years ago

Not a problem as breakage rather than confusion when the URI is used in data as mentioned up-thread.

I don't see a strong connection to <about:config> because the URL bar has various capabilities. That because some systems use poor URIs, we ought to, rather I see that as a factor against when looking at the balance of options because "cool URIs"

A similar oddity is created with GRAPH ?g { } evaluating to the named graphs of the dataset.

To combine with templating, an indirection through a keyword DEFAULT could actually be helpful. The query text says DEFAULT and the execution setup says "DEFAULT is <uri>" and it also applies to the query outsideGRAPHbut this isn't needed at all, the protocol does this withdefault-graph-uri` for example, as it is about the formation of the dataset.

To make GRAPH ?g focus on the default graph can be done when the default graph also has a regular URI name in the collection of all graphs available to be queried. default-graph-uri and FROM both take for an actual graph.

This is then fits with the UNION feature #59.

cygri commented 5 years ago

I do acknowledge the problem that <about:default-graph> would have to appear in the result of GRAPH ?g {}, and we probably wouldn't want that.

You mention a DEFAULT keyword. Where would that appear in the query? Do you mean as GRAPH DEFAULT {...} in a graph pattern? I said above that this wouldn't help with parameterised queries where the target graph is a parameter.

Specifying the dataset via FROM/FROM NAMED or their protocol counterparts is not really an option when working with a system that relies heavily on named graphs. In our product, when working with a named graph ?userGraph containing some user data, queries often do things like:

BIND (tq:graphWithImports(?userGraph) AS ?dataGraph)
BIND (tq:metadataGraph(?userGraph) AS ?metadataGraph)

which produces IRIs of virtual or system-managed graphs, and the query then casually jumps back and forth between those graphs using GRAPH ?dataGraph {} and GRAPH ?metadataGraph {}. If we used FROM/FROM-NAMED or *-graph-iri, it would “wipe out” all these other graphs from the dataset. So we don't use these forms.

Maybe you are getting at something like this?

PREFIX my: <...>
DEFAULT GRAPH IS ALSO my:DefaultGraph
SELECT ... {
    ...
    GRAPH my:DefaultGraph { ... }
    ...
}

where DEFAULT GRAPH IS ALSO makes the existing default graph available as an additional named graph in the dataset, with an IRI chosen by the query author? That seems like it could be a solution, although it is rather byzantine.

dbooth-boston commented 5 years ago

For me, the baseline choice is keywords; other proposals have to offer some advantage overall.

A clear advantage of a URI over a keyword is that a URI allows all graphs to be identified uniformly, using the same syntax, rather than having to special-case the default graph.

[<about:default-graph> is not] naming the same graph across datasets.

Yes, but that is also true of the DEFAULT keyword (or in quads, the lack of a graph URI): it is not naming the same graph across datasets. That is intentional, just as it is for certain URI schemes, such as the about: and file: schemes.

To quote from RFC 3986:

URIs have a global scope and are interpreted consistently regardless of context, though the result of that interpretation may be in relation to the end-user's context. For example, "http://localhost/" has the same interpretation for every user of that reference, even though the network interface corresponding to "localhost" may be different for each end-user: interpretation is independent of access. However, an action made on the basis of that reference will take place in relation to the end-user's context, which implies that an action intended to refer to a globally unique thing must use a URI that distinguishes that resource from all other things. URIs that identify in relation to the end-user's local context should only be used when the context itself is a defining aspect of the resource, such as when an on-line help manual refers to a file on the end- user's file system (e.g., "file:///etc/hosts").

Identifying the default graph "in relation to the end-user's local context" is exactly the desired behavior in this case, and that is what a URI like <about:default-graph> offers.

In summary, I do not see any semantic benefit in using a keyword instead of a URI, but I do see a downside, because of the special casing that it requires.

On the other hand, maybe "DEFAULT" would be convenient as syntactic sugar for the URI, just as Turtle allows "a" as syntactic sugar for rdf:type.

JervenBolleman commented 5 years ago

We can also think of a standard location derived IRI for these graph names.

Assuming a public endpoint e.g. https://sparql.rhea-db.org/sparql Then we could have as UNION graph [https://sparql.rhea-db.org/sparql/?graph=union] And for the default. [https://sparql.rhea-db.org/sparql/?graph=default]

The idea here is that in a IRI form space of the sparql protocol it is very unlikely that these IRIs will have been minted and in use.

namedgraph commented 5 years ago

Why not use ?default from GSP Indirect Graph Identification?

kasei commented 5 years ago

Why not use ?default from GSP Indirect Graph Identification?

I think using an IRI with ?default in the query string might cause problems for deployments where there are multiple endpoints for a single service. At that point, you'd have multiple IRIs all being used to identify the single default graph, and the underlying SPARQL engine might not even be aware of which (if any) of those IRIs represented valid endpoints.

afs commented 5 years ago

A keyword makes more sense to be for the GRAPH DEFAULT use case, used inside a query to switch back to the default graph. Using a specific URI, of whatever form, makes a need to explain the local meaning and why it is not in the list of named graphs.

For the protocol parameterization use case in #57, a URI is more convenient.

In summary - both. DEFAULT as surface syntax, and a URI for parameterization from outside the query syntax.

abrokenjester commented 4 years ago

RDF4J has a special constant sesame:NIL that refers to null named graph. Would be nice to have a more standard identifier.

I'm a bit late to the party, thanks for mentioning this though (~we've actually renamed it to rdf4j:nil though both work~ update is on the to do list - see https://github.com/eclipse/rdf4j/issues/2401). I should also point out that in addition RDF4J also accepts the DEFAULT keyword. I'm kinda with @afs here that there's room for both a keyword and an IRI.

As an aside: we named our IRI constant sesame:nil rather than sesame:default because we wanted to be very explicit about the fact that it references the unnamed ("nil") graph, that is, all statements for which the backing database has no named graph information available. The term "default graph" is a more flexible concept that, depending on database implementation defaults, can contain only those 'ungraphed' statements, or can contain some union of everything available in the store (including statements from all named graphs) (edit I should be more clear here that what is implementation-dependent is how the default/implicit dataset is defined: its default graph can be configured in multiple ways).

That being said I'm not against using the term 'default' for what we're trying to do here. I just wanted to call out that we'll need to be clear that it means default graph in a non-ambiguous way.

tiffoknee commented 4 years ago

@jeenbroekstra - what's the prefix for rdf4j:nil? I used sesame but it points to a dead site that sells coupons, and that seems wrong..

abrokenjester commented 4 years ago

@tiffoknee ah, apologies, I was mistaken when I said we'd renamed it. It's still sesame:nil (full URI is http://www.openrdf.org/schema/sesame#nil). There's an open ticket for us to change it to a more up-to-date name and URI (see https://github.com/eclipse/rdf4j/issues/2401).

tiffoknee commented 4 years ago

Ah ok - thanks. I found that (as a noob I expect I do inexplicable things) if I do "select * from default" it flags up a syntax error but does actually work.. Probably this is bad? I don't know.

abrokenjester commented 4 years ago

Ah ok - thanks. I found that (as a noob I expect I do inexplicable things) if I do "select * from default" it flags up a syntax error but does actually work.. Probably this is bad? I don't know.

That sounds like a minor bug in the SPARQL editor in the RDF4J workbench. Thanks for pointing this out, issue logged as https://github.com/eclipse/rdf4j/issues/2421. I suggest we take further discussion of this and other RDF4J-specific problems to the RDF4J mailinglist and/or issue tracker.

afs commented 4 years ago

Apache Jena URIs:

urn:x-arq:DefaultGraph urn:x-arq:UnionGraph

These are accessed through functions in the SPARQL engine so changing or adding them should be not too disruptive.

Mild advantage of "urn"; is that it is not HTTP-dereferencable.

lisp commented 4 years ago

dydra recognizes the following uris :

urn:dydra:default
urn:dydra:named
urn:dydra:all
urn:dydra:none

with the intended behaviour :

although the last two cases are not obvious, on one hand, i do not recall any occasion to explain the distinction, but, on the other, i have not checked if the situation arises in any actual repository and/or inline in any query.

abrokenjester commented 4 years ago

Mild advantage of "urn"; is that it is not HTTP-dereferencable.

I had a look at that recently when considering an updated IRI for RDF4J for this purpose, but I believe the idea of using urn:**x-something** for experimental / unregistered namespaces has been deprecated now, so I am not sure we are in a position to sanction a "official" urn for this purpose, unless we can use a registered IANA namespace for this. Does W3C perhaps have one that we can use?

kasei commented 4 years ago

I believe the idea of using urn:**x-something** for experimental / unregistered namespaces has been deprecated now, so I am not sure we are in a position to sanction a "official" urn for this purpose

A tag URI would serve the same purpose and could be placed under a w3 authority.

afs commented 3 years ago

My understanding is that x-*, for URNs and HTTP headers is deprecated is that transition from "experimental" to "agreed" is painful. Instead, the style is "just do it" and register when agreed.