w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Allow all URI characters in prefixed names #89

Open dbooth-boston opened 2 years ago

dbooth-boston commented 2 years ago

In turtle and SPARQL, prefixed names like fhir:patient have a very limited syntax after the prefix. This means that prefix definitions can only be used to shorten URIs in very limited ways that conform to the syntax rules for local names.

For example, suppose I have these URIs:

<http://example.org/Encounter/f201>
<http://example.org/Patient/f201> 
<http://example.org/Practitioner/d444> 
<http://example.org/Practitioner/f201> 

Since they all have the http://example.org/ part in common as a prefix, it would be nice if I could define a prefix ex: like this to shorten them all:

@prefix ex: <http://example.org/> .

ex:Encounter/f201
ex:Patient/f201
ex:Practitioner/d444
ex:Practitioner/f201

But those prefixed names are not allowed in Turtle or SPARQL because slash ("/") is not allowed in a local name. It would be helpful if a prefixed name would allow any valid URI syntax after the prefix, so that URIs could be shortened more flexibly.

One might claim that the above URIs should have been designed differently, to avoid using a slash in that part of the URI, but often we do not control how the URIs are designed: they are given to us and we must deal with them as they are.

afs commented 2 years ago

This is a clash in SPARQL. / is in use elsewhere in property paths.

Some of the other characters are possible.

But maybe have delimiters for extended prefix names c.f. CURIEs?

(The other concern is making it too easy to create URIs that break RDF/XML. YMMV.)

TomConlin commented 2 years ago

Although I have often wanted bare top level "site" prefixes for those cases where an item is attributed with "came from waves hand vaguely ..." example.org I do not see this as the best way forward.

The local-ID portion of a curie in the wild has more then enough variability as-is without including what can more succinctly be viewed as the prefix "type" or path refinement and is already accommodated in the prefix generation process as a leading uri fragment.

Also if the number of distinct prefixes matters at all in comparison with repeating the url path fragment with every data item then I urge consideration of datasets with more rows.

tl;dr qualify the prefix not the item-identifier as then anyone can qualify differently and it is no longer an identifier.

dbooth-boston commented 2 years ago

@TomConlin , am I understanding properly? Are you suggesting that CURIEs (which neither Turtle nor SPARQL currently support) would be an adequate solution to the problem?

TomConlin commented 2 years ago

I am recommending that when partitioning strings, your uri characters portion stay with the prefix so as not to break existing uses of curies nor impede the extension of SPARQL/Turtle to include them
Or adoption of some practial format which would expect ntriples/quads/... as curies.

dbooth-boston commented 2 years ago

@TomConlin , sorry for my continued confusion, but could you perhaps give an example? Specifically, how would you propose to solve the problem of having the following URIs -- which you do not control, so you cannot change them -- and you want to define some kind of prefix/CURIE/whatever to shorten the references to them. How do you propose to shorten them?

<http://example.org/Encounter/f201>
<http://example.org/Patient/f201> 
<http://example.org/Practitioner/d444> 
<http://example.org/Practitioner/f201> 
TomConlin commented 2 years ago

Does this help?

@prefix exEncounter:   <http://example.org/Encounter/>  .
@prefix exPatient:         <http://example.org/Patient/ >  .
@prefix exPractitioner:  <http://example.org/Practitioner/>  . 

exPatient:f201 exEncounter:f201 exPractitioner:d444;  exPractitioner:f201   .

took some liberties to rearrange list into plausible statment

dbooth-boston commented 2 years ago

Yes, thanks for the clarification. Indeed that is an option, since that is exactly what we currently have to do in Turtle and SPARQL. But it causes namespace proliferation, so the point of the issue is come up with a way to define a single prefix for those URIs, to avoid that namespace proliferation.

pchampin commented 2 years ago

Could we solve this the other way around, i.e. by making it easier to handle and reuse big lists of prefixes? What if we allowed a set of prefix definitions to be "imported" by URL, in the same way as JSON-LD contexts? Something like

@prefixes <http://example.org/prefixes.ttl> .
# assuming that URL contains  the list of prefixes given by @TomConlin above

exPatient:f201 exEncounter:f201 exPractitioner:d444;  exPractitioner:f201   .

It would not matter so much that we are using a lot of prefixes if the burden of declaring them is offloaded to a single resource.

TomConlin commented 2 years ago

proliferates prefixes, yes it does; although very true, it is a relative thing when I worked with hundreds of millions of statements there were still only hundreds of prefixes and that number could have been reduces with less baroque modeling.

As I see it, the trade-off is to perturb hundreds of millions of local identifiers or hundreds of prefixes.

I would still like to see a way to get at the base uri maybe something like the not well considered following ...

@prefix ex:   <http://example.org/>  .
@prefix Encounter:    ex:Encounter/ .
@prefix Patient:         ex:Patient/ .
@prefix Practitioner:  ex:Practitioner/ . 

Patient:f201 Encounter:f201 Practitioner:d444;  Practitioner:f201   .

which is of course is not any type of valid @prefix syntax I know of but does allow access to and reuse of the base or root url

HughGlaser commented 2 years ago

Hi Tom, Don't give an inch! :-) You want

@prefix ex: <http://example.org/

.

ex:Encounter/f201 ex:Patient/f201 ex:Practitioner/d444 ex:Practitioner/f201 and that is what you should have.

I still clearly remember trying to work out what I had done wrong, when that didn't work, and my WTF? when I found it wasn't me, it was the language.

@prefix ex: http://example.org/ . @prefix Encounter: ex:Encounter/ . @prefix Patient: ex:Patient/ . @prefix Practitioner: ex:Practitioner/ . is not what is easy. It might as well have the full URI there.

I want a single prefix for each dataset, and then the different paths following them, in the URIs.

prefixes look like macros, and should behave as such as far as possible, if our target users are to avoid confusion, never mind helped.

OK, this may not be possible because it breaks too many other things. But let's not forget what we really want, and that anything else is a compromise.

Cheers Hugh

On 13 Oct 2021, at 10:31, Tom Conlin @.***> wrote:

proliferates prefixes, yes it does; although very true, it is a relative thing when I worked with hundreds of millions of statements there were still only hundreds of prefixes and that number could have been reduces with less baroque modeling.

As I see it, the trade-off is to perturb hundreds of millions of local identifiers or hundreds of prefixes.

I would still like to see a way to get at the base uri maybe something like the not well considered following ...

@prefix ex: http://example.org/ . @prefix Encounter: ex:Encounter/ . @prefix Patient: ex:Patient/ . @prefix Practitioner: ex:Practitioner/ .

Patient:f201 Encounter:f201 Practitioner:d444; Practitioner:f201 .

which is of course is not any type of valid @Prefix syntax I know of but does allow access to and reuse of the base or root url

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

dbooth-boston commented 2 years ago

What if we allowed a set of prefix definitions to be "imported" by URL, in the same way as JSON-LD contexts?

That seems worth considering. I think that also raises the question of whether a general-purpose "include" or "import" capability should be added to RDF. RDF serializations currently lack such capability, because a design principle was to make each RDF file be completely self-contained. OWL has an import statement, but I don't know how well suited it would be for RDF that does not otherwise use OWL, and it doesn't work at the syntactic level that would be needed for prefix definitions.

BTW, this issue is related to issue #13 (Namespace proliferation) and issue #12 (IRI allocation) .

afs commented 2 years ago

Re:

PREFIX exEncounter:    ex:Encounter/

Technical points: having an extended set of characters at this point is a little tricky for tokenizing. Not impossible but it is not a simple matter of allowing [136s] PrefixedName as well as URIs.

That would be:

PREFIX exEncounter:    ex:Encounter\/

Is that adequate? Does having a less than perfect solution in the PREFIX to give better appearance in the data provide a practical tradeoff?

We have CURIEs with delimited syntax form: '[' curie ']' (yes, it reuses a delimiter pair - can't have everything).


Imports: You can do with today by concatenating Turtle files :-) "a local data management" issue.

TallTed commented 2 years ago

@TomConlin wrote

Patient:f201 Encounter:f201 Practitioner:d444; Practitioner:f201 .

That's invalid Turtle, and very distracting, especially since it's been copied several times. The semicolon should be a comma.

Errors like this are part of why I include lots of extra whitespace in my examples, and use lots of spaces (no tabs!) to indent things into recognizable subject predicate object columns.


@HughGlaser wrote a terrible email reply, which doesn't show up right and loses much of its meaning, so I won't bother replicating it here, because GitHub doesn't handle Markdown in email comments.

Even worse, since there are no codefences (and they wouldn't work if they were present), @HughGlaser pinged the @prefix user of GitHub, as @TomConlin did earlier in https://github.com/w3c/EasierRDF/issues/89#issuecomment-942110883

I'm quite sure that user @prefix does not care about this discussion. Please take care to always wrap any @entity in some kind of codefence, or separate that @ from the rest of the entity name (e.g., @ name or `@name`), whenever they come up in Github discussions!

dbooth-boston commented 2 years ago

Alas, I tried to edit @HughGlaser 's post, to correct the formatting, but github would not let me do so, because it originated as an email reply and markdown is therefore disabled in it. :( I guess the lesson here is: Don't use github email replies for anything but the simplest of plain text responses, because the formatting cannot later be corrected.

namedgraph commented 2 years ago

Is this one of the pressing issues in the RDF ecosystem?..

HughGlaser commented 2 years ago

@TallTed wrote

@HughGlaser [wrote a terrible email reply]

Is that a comment on the content, or just the formatting?


Off topic: My comments on the formatting.

I find you can't attach images in email replies either, it seems - so I deleted my reply that had one.

I did nothing but cut, paste and add simple plain text to an email I received from the mailing list. It hadn't crossed my mind that a simple reply to an email would get screwed up - I guess we all know better now. I am pretty pissed off that the system is so bad that I would generate this level of distraction.

David Booth, thanks ever so for taking the trouble to try to fix my post - it looks like you succeeded pretty well, I see. Your opening comment in the thread is still the best description, I think.

dbooth-boston commented 2 years ago

Is this one of the pressing issues in the RDF ecosystem?..

I don't think I'd personally put it in the top three, but if one were to design a new higher-level RDF serialization -- hint hint -- then it definitely should be considered. I think it is important to remove every bit of unnecessary complexity that we can from RDF usage, because complexity always seems to multiply.