w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
121 stars 19 forks source link

Use labels instead of URIs #159

Open mielvds opened 2 years ago

mielvds commented 2 years ago

Disclaimer: This is probably a sketchy idea and it only serves the purpose of a better UX (but I think SPARQL could use some of that). I don't really have a good solution in mind and I also admit that there might not be anything we can properly do. But if that gets a syntax discussion going, I'm happy.

Why?

Because writing queries using opaque URIs is hard, but more importantly: not popular. Wikidata queries are the perfect example:

    SELECT (MAX(?population) AS ?population) ?country WHERE {
      ?city wdt:P31/wdt:P279* wd:Q515 .
      ?city wdt:P1082 ?population .
      ?city wdt:P17 ?country .
    }

Previous work

Previous work would include basically any other query language: SQL, MongoQL or even Cypher where you can just use labels (yes yes, they don't have globally unique identifiers and all that).

Proposed solution

LABEL rdfs:label@en
SELECT * {
?city [country] ?country .
}

would translate to

SELECT * {
?city ?p ?country.
?p rdfs:label "country".
}

Considerations for backward compatibility

I would stick to syntactic sugar for 1.1.

VladimirAlexiev commented 2 years ago

@mielvds I see your point but the proposal is half cooked:

I'd rather use locally defined names, something like

ALIAS country wdt:P17
ALIAS population wdt:P1082
SELECT * {
  ?city [country] ?country; 
     [population] ?population
}

But in WD, each Pnnnn is represented with a coordinated bunch of props in 6 namespaces. Eg to get population at point in time:

SELECT * {
  ?city p:P1082 [ps:P1082 ?population; pq:P585 ?time]
}

So with the "alias" approach I'd have to go like this:

ALIAS population_direct wdt:P1082
ALIAS population_stmt p:P1082
ALIAS population_main ps:P1082
ALIAS pointInTime_qualifier pq:P585
SELECT * {
  ?city [population_stmt] [
    [population_main] ?population;
    [pointInTime_qualifier] ?time
  ]

It doesn't seem better to me.


BTW writing WD queries is surprisingly non-painful because

mielvds commented 2 years ago

@VladimirAlexiev thnaks for your thoughts!

@mielvds I see your point but the proposal is half cooked:

like I said: sketchy ;)

* there may be many things labeled "country"

absolutely

* brackets are used in CURIEs for a similar purpose, but brackets in SPARQL are blank nodes, so you can't use them

It was just for illustration, but correct, it would have to be something else

I'd rather use locally defined names, something like

ALIAS country wdt:P17
ALIAS population wdt:P1082
SELECT * {
  ?city [country] ?country; 
     [population] ?population
}

But in WD, each Pnnnn is represented with a coordinated bunch of props in 6 namespaces. Eg to get population at point in time:

SELECT * {
  ?city p:P1082 [ps:P1082 ?population; pq:P585 ?time]
}

So with the "alias" approach I'd have to go like this:

ALIAS population_direct wdt:P1082
ALIAS population_stmt p:P1082
ALIAS population_main ps:P1082
ALIAS pointInTime_qualifier pq:P585
SELECT * {
  ?city [population_stmt] [
    [population_main] ?population;
    [pointInTime_qualifier] ?time
  ]

It doesn't seem better to me.

Yeah that wouldn't help much. But this does trigger the idea of being able to publish a public alias config, very much like the JSON-LD config.

http://example.org/aliases

{
  "population_direct": "wdt:P1082"
 "population_stmt": "p:P1082"
 "population_main": "ps:P1082"
 "pointInTime_qualifier": "pq:P585"
}
  CONTEXT <http://example.org/aliases>
  SELECT * {
    ?city [population_stmt] [
      [population_main] ?population;
      [pointInTime_qualifier] ?time
    ]

But I'm probably taking it too far :)

BTW writing WD queries is surprisingly non-painful because

* there's autocompletion "wdt:country" -> wdt:P17 and "wd:Bulgaria" -> "wd:Q219". Since the ranking is very good, it works very well

* there's readout on hover of both Pnnn and Qnnn

Sure that helps, but it's only available in the WD editor.

* if you're writing shapes, there's a SHEX editor that displays dynamic comments as a readout, eg
  wdt:P17 wd:Q219 # country: Bulgaria

Now that principle could perhaps be more of a first-class citizen in SPARQL

jmkeil commented 2 years ago

Instead of extending the syntax for aliases, one can just use prefixes for that.

PREFIX name: <http://www.w3.org/2000/01/rdf-schema#label>

SELECT ?resource
WHERE { ?resource name: "the label". }

Not saying that this should be considered as good practice. But there are cases, this can ease reading/writing:

PREFIX key: <http://example.org/parameterName>
PREFIX value: <http://example.org/parameterValue>

SELECT ?resource
WHERE { ?resource <http://example.org/parameters> [key: "a"; value: 1], [key: "b"; value: 2] . }
afs commented 2 years ago

My understanding is that WikiData purposely decouples the appear of the URI from its natural language form because that is dependent on the user. (Is this style found elsewhere?)

At the point of the writing the query (UI etc) that the mapping to abstract URIs happens because it is context/user sensitive.

It takes a number of components, not just SPARQL to have an end-user application.

jmkeil commented 2 years ago

Is this style found elsewhere?

Yes. The OBO Foundry uses this style, too. Except of the harder reading/writing without UI support, it has several advantages for the re-use and maintenance of large multi-lingual datasets (see 1c in [1]).

VladimirAlexiev commented 2 years ago

To elaborate on @jmkeil :

ericprud commented 2 years ago

ShEx.js has a backtick extension which uses a LABEL directive to specify how to resolve backticked labels, e.g.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX ex1: <http://ex1.example/>
PREFIX ex2: <http://ex2.example/>

LABEL [ rdfs:label skos:label ]
<S> {
  ex1:`protein name` LITERAL;
  ex2:`protein type` [ `signaling` `regulatory` `transport` ];
  `protein width` `ucum microns`
}

The LABEL directive specifies an ordered list of predicates which identify the node to substitute into the schema in place of the backticked label. Try it by clicking on the protein record button in this manifest and selecting some passing data.

ShExC has "LABEL [rdfs:label]" and later on "`transport`" (i'm in escaping hell here). The metadata graph has: ex1:Signaling rdfs:label "signaling" so of course the resulting term is ex1:Signaling.

Prefixing a backtick (e.g. "ex1:`protein name`") restricts to those terms which include that prefix's namespace URL.

Because ShEx is defined in terms of a JSON structure (ShExJ), this isn't really part of the ShEx language, more of a parser trick. (I'f you comment out (#) or delete the Query Map and click validate, you'll see "predicate": "http://ex2.example/protType" , which means the backtick information is lost.) This is probably not an issue for SPARQL but would require consideration for something like SPIN.

This feature hasn't seen much use in wikidata, probably because the community that's zealous about numeric entity identifiers is the same community that's maintaining the schemas. That said, it's easy to imagine other folks who work with wikidata, OBO, SNOMED, etc wanting to favor {read,type}ability over internationalization.