w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

IMPORT NAMESPACES #187

Open kurtcagle opened 1 year ago

kurtcagle commented 1 year ago

Why?

I may have missed this in the SPARQL 1.2 suggestions, but it has been on my wish list for a while.

Namespace management in SPARQL is a pane. Different systems handle namespaces in different ways, some providing user or system namespaces if no namespaces are provided, others requiring that namespaces must be declared. When you're potentially dealing with dozens of namespaces, this makes namespace declarations a frequent source for problems, especially when different people may have to manually enter these.

Previous work

Jena currently requires namespace declarations regardless, Allegrograph defines user and system namespace declarations, others tend to be all over the place.

Proposed solution

Create a new directive in SPARQL called IMPORT NAMESPACES that would either link to an RDF file with prefix declarations or would load the namespaces from a system-defined namespaces file if the default keyword was applied.

It would have the following form:

IMPORT NAMESPACES <URL>|SYSTEM|USER NOCACHE?

This would import the namespace and prefix declarations then keep a cached (and optimized) version of this CONTEXT object in memory. If the NOCACHE directive was used, then this would reload the namespaces prior to making the query (which might be the case if the namespaces themselves are under development).

The query could contain more than one such import statement. If namespaces and prefixes are the same from multiple declarations, duplications would be ignored. If the prefixes are the same but namespaces are different, then the engine should throw an exception. A namespace can have more than one distinct prefix.

IMPORT NAMESPACES must precede PREFIX statements.

IMPORT NAMESPACES <file:///c:/path/to/namespaces.ttl>
IMPORT NAMESPACES <https://myCommonNamespaces.ttl>
PREFIX someNS: <http://somenamespace.com/not/in/others#>

SELECT ?car ?vin where {
     ?car schema:vin ?vin.
} order by ?vin 

or

IMPORT NAMESPACES system

SELECT ?car ?vin where {
     ?car schema:vin ?vin.
} order by ?vin 

or

IMPORT NAMESPACES user

SELECT ?car ?vin where {
     ?car schema:vin ?vin.

} order by ?vin 

The assumption on behavior is that collisions would be resolved as if the imported namespaces had been included inline.

If the IMPORT NAMESPACES path is not included, then either system or user must be included. This indicates that whatever is currently defined as a system or user namespace set be loaded. If neither of these are defined, this would throw an error.

If no IMPORT NAMESPACES expression is included, then only items defined by PREFIX are added as namespaces. PREFIX declared namespaces would override any imported namespaces, or throw an exception if that is the default behavior of the database.

The system would load the namespaces from the turtle file and would otherwise ignore the contents of the file. Once loaded, the

Considerations for backward compatibility

The default behavior if IMPORT NAMESPACES is not included is to only include PREFIX'd namespace declarations, which should be the case normally.

namedgraph commented 1 year ago

I don't think the query string should depend on external data.

kurtcagle commented 1 year ago

A variant to this was brought up previously: https://github.com/w3c/sparql-12/issues/70. I'll update this if I find any other mentions.

MichaelSullivanArchitect commented 1 year ago

[like] Michael J. Sullivan reacted to your message:


From: Kurt Cagle @.> Sent: Tuesday, May 23, 2023 10:30:22 PM To: w3c/sparql-12 @.> Cc: Subscribed @.***> Subject: [External] : [w3c/sparql-12] IMPORT NAMESPACES (Issue #187)

Why?

I may have missed this in the SPARQL 1.2 suggestions, but it has been on my wish list for a while.

Namespace management in SPARQL is a pane. Different systems handle namespaces in different ways, some providing user or system namespaces if no namespaces are provided, others requiring that namespaces must be declared. When you're potentially dealing with dozens of namespaces, this makes namespace declarations a frequent source for problems, especially when different people may have to manually enter these.

Previous work

Jena currently requires namespace declarations regardless, Allegrograph defines user and system namespace declarations, others tend to be all over the place.

Proposed solution

Create a new directive in SPARQL called IMPORT NAMESPACES that would either link to an RDF file with prefix declarations or would load the namespaces from a system-defined namespaces file if the default keyword was applied.

It would have the following form:

IMPORT NAMESPACES |SYSTEM|USER NOCACHE?

This would import the namespace and prefix declarations then keep a cached (and optimized) version of this CONTEXT object in memory. If the NOCACHE directive was used, then this would reload the namespaces prior to making the query (which might be the case if the namespaces themselves are under development).

The query could contain more than one such import statement. If namespaces and prefixes are the same from multiple declarations, duplications would be ignored. If the prefixes are the same but namespaces are different, then the engine should throw an exception. A namespace can have more than one distinct prefix.

IMPORT NAMESPACES must precede PREFIX statements.

IMPORT NAMESPACES file:///c:/path/to/namespaces.ttl IMPORT NAMESPACES https://myCommonNamespaces.ttlhttps://urldefense.com/v3/__https://myCommonNamespaces.ttl*3E__;JQ!!ACWV5N9M2RV99hQ!KkdCpkliI-K9XEztsBbQeGMhAxYqW6URpxuXI6Vvw9mzaVeF8VJrqTIL1wRnzT8SFllajMlHvEZQ5C-fMosP_zmIWz9wxFs$ PREFIX someNS: http://somenamespace.com/not/in/others#https://urldefense.com/v3/__http://somenamespace.com/not/in/others**A3E__;IyU!!ACWV5N9M2RV99hQ!KkdCpkliI-K9XEztsBbQeGMhAxYqW6URpxuXI6Vvw9mzaVeF8VJrqTIL1wRnzT8SFllajMlHvEZQ5C-fMosP_zmI6DteVg0$

SELECT ?car ?vin where { ?car schema:vin ?vin. } order by ?vin

or

IMPORT NAMESPACES system

SELECT ?car ?vin where { ?car schema:vin ?vin. } order by ?vin

or

IMPORT NAMESPACES user

SELECT ?car ?vin where { ?car schema:vin ?vin.

} order by ?vin

The assumption on behavior is that collisions would be resolved as if the imported namespaces had been included inline.

If the IMPORT NAMESPACES path is not included, then either system or user must be included. This indicates that whatever is currently defined as a system or user namespace set be loaded. If neither of these are defined, this would throw an error.

If no IMPORT NAMESPACES expression is included, then only items defined by PREFIX are added as namespaces. PREFIX declared namespaces would override any imported namespaces, or throw an exception if that is the default behavior of the database.

The system would load the namespaces from the turtle file and would otherwise ignore the contents of the file. Once loaded, the

Considerations for backward compatibility

The default behavior if IMPORT NAMESPACES is not included is to only include PREFIX'd namespace declarations, which should be the case normally.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/w3c/sparql-12/issues/187__;!!ACWV5N9M2RV99hQ!KkdCpkliI-K9XEztsBbQeGMhAxYqW6URpxuXI6Vvw9mzaVeF8VJrqTIL1wRnzT8SFllajMlHvEZQ5C-fMosP_zmIhonmF6c$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOMYTH65773WMTE2O5TTT7DXHU275ANCNFSM6AAAAAAYMQNVBM__;!!ACWV5N9M2RV99hQ!KkdCpkliI-K9XEztsBbQeGMhAxYqW6URpxuXI6Vvw9mzaVeF8VJrqTIL1wRnzT8SFllajMlHvEZQ5C-fMosP_zmIzu6KH5w$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VladimirAlexiev commented 2 months ago

@kurtcagle many databases (including GraphDB) automatically add newly encountered prefixes to their namespaces. So there's no need for a new command

IMPORT NAMESPACES <file:///c:/path/to/namespaces.ttl>

If that file has only prefix declarations, LOAD will do just the same.

ericprud commented 2 months ago

If the query uses prefixes that are declared differently in LOADed documents, query behavior will dependent on their order (which could be a bit painful to debug). That reassignment scenario may seem unlikely but I've seen a fair amount of confusion WRT the wd* prefixes in Wikidata.

VladimirAlexiev commented 2 months ago

@namedgraph

I don't think the query string should depend on external data.

Agreed! Query editors can automatically add prefixes, but query runners should not.

@ericprud

That reassignment scenario may seem unlikely query behavior will dependent on their order

The repos I know don't overrider prefixes when they see a new definition.

Query editor behavior should depend on defined namespaces / captured prefixes. So the user will see immediately what namespace is used by his/her repository. If a user wants to redefine a namespace in the repository, then they have to use a custom repo function (namespace management). Unless we define a new command DELETE PREFIX...