Switch from URI to IRI terminology in WebID spec

csarven commented 2 years ago

Originally recorded/tracked in https://www.w3.org/2005/Incubator/webid/track/issues/71

Considerations (requirements, advisory..) to include in the WebID 1.0 spec.

Use in the context of the HTTP protocol and data
Normalization
Security

Other considerations:

Existing deployments and upgradeabillity

pmcb55 commented 12 months ago

Hi @csarven/all,

So I completely agree with this. It seems this was originally raised over 11 years ago(!), and in trying to follow all the mailing list threads since then, I don't think I've come across a single objection from anyone in all that time.

The very original comment from ~~@pchampin~~ @Antoine-Zimmermann (here) still makes perfect sense to me today, i.e.: "all current W3C specs in the Semantic Web activity now use IRI, you should replace URI with IRI everywhere, I think;"

So what needs to happen to close this issue, and then do the search-and-replace...?

If it helps, here's my formal +1 :smile: !

csarven commented 12 months ago

Pat, do you mean besides:

Considerations (requirements, advisory..) to include in the WebID 1.0 spec.

Use in the context of the HTTP protocol and data

Normalization

Security

Other considerations:

Existing deployments and upgradeabillity

?

You're welcome to create a PR addressing those and other considerations (that people have) so that we can move towards "closing this issue".

(And, no, I don't believe something like this is a simple matter of "search-and-replace", although life would indeed be a lot simpler if that's all we had to do :))

TallTed commented 12 months ago

This is an ongoing cyclical discussion.

URL, URI, and IRI are roughly equivalent in most readers' minds. Yes, there are technical distinctions, but surprisingly many people who encounter IRI think it's a typo for URI. URL has managed to replace URI as the superclass of all identifiers, even though the L for location is obviously a subset of identifier (and URN has almost completely fallen to the wayside).

Note, for instance, What is Happening with "International URLs", dated 2015-08-25, which cites the then current URL spec.

The now current URL Living Standard — Last Updated 8 December 2023 says —

Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

I'm in several WGs at the moment. All of them have followed the above guidance for human-focused documentation, though all have also recognized the historical and technical differences between URL, URI, and IRI.

I am more than hesitant to suggest that we should treat issues logged 10 years ago as if they were recent, and even more hesitant to accept a comment of similar age ("Comments on current editor's draft (Web 1.0, 3rd October 2012)") as if it were currently accurate, especially as it's being mis-cited above (it's not from nor even cited by @pchampin; it's from @Antoine-Zimmermann and dated 23 Oct 2012 18:53:02 +0200).

jacoscaz commented 12 months ago

The now current URL Living Standard — Last Updated 8 December 2023 says —

Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

Very much +1 on this. Furthermore, within the context of WebID, URL is actually what makes the most sense. WebIDs must dereference to profile documents and therefore must not only be identifiers (IRI / URI) but also locators (URL). Based on my personal experience, the "internationalized" characteristic of IRIs is pretty much assumed to apply to URLs, too, whereas the distinction between identifier and locator is much more likely to produce issues.

pmcb55 commented 11 months ago

Hiya @TallTed,

This is an ongoing cyclical discussion.

Yep, exactly - and my strong fear is that it'll continue to cycle ad infinitum, unless we once-and-for-all resolve it by being completely unambiguous and explicit. That's precisely what the RDF Recommendation does, in section 3.2 IRIs. And doesn't that spec still remain the very foundation for everything we're discussing here...?

Surely all the arguments being made here (i.e., 'URI/IRI are confusing', 'most people conflate them anyway', 'there's only the one algorithm', etc.) should apply to RDF itself too, right? So logically, should the upcoming RDF 1.2 work adopt this 'let's just use URL' position too...?

URL, URI, and IRI are roughly equivalent in most readers' minds. Yes, there are technical distinctions, but surprisingly many people who encounter IRI think it's a typo for URI. URL has managed to replace URI as the superclass of all identifiers, even though the L for location is obviously a subset of identifier (and URN has almost completely fallen to the wayside).

Yep, there sure are "technical distinctions", and this work is a technical specification - so aren't those technical distinctions important (see this "Java doesn't support IRI?")?

Yes, it's unfortunate that people might wrongly conflate, or misunderstand, or mixup these terms, but I think it would be a big mistake to attempt to paper-over these distinctions by misusing the term URL in this specification when we all know we really, technically, mean IRI (again, as per the RDF spec).

Note, for instance, What is Happening with "International URLs", dated 2015-08-25, which cites the then current URL spec.

The now current URL Living Standard — Last Updated 8 December 2023 says —

Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

Yep sure - but those WhatWG folks aren't building on the foundation of RDF (are they?). So I don't think they have any strong need to be consistent with RDF's foundations (i.e., IRIs).

I'm in several WGs at the moment. All of them have followed the above guidance for human-focused documentation, though all have also recognized the historical and technical differences between URL, URI, and IRI.

Ok, but are those WGs founded on (or strongly related to) RDF I wonder? If so, then if they ignore the clear, explicit example of RDF itself (i.e., it's IRIs - full-stop, no discussion), then aren't they simply adding to the confusion by using anything other than IRIs (since RDF mandates IRIs - full-stop, no discussion)...?

And isn't this WebID work intended to become a formal technical specification (and so it shouldn't be watered-down technically to appease looser, non-normative needs of 'human-focused documentation', should it)?

I am more than hesitant to suggest that we should treat issues logged 10 years ago as if they were recent, and even more hesitant to accept a comment of similar age ("Comments on current editor's draft (Web 1.0, 3rd October 2012)") as if it were currently accurate, especially as it's being mis-cited above (it's not from nor even cited by @pchampin; it's from @Antoine-Zimmermann and dated 23 Oct 2012 18:53:02 +0200).

Yeah sorry - I've editted/fixed the attribution above. But I don't understand why this suggestion 'being old' would be relevant. The sentence, "all current W3C specs in the Semantic Web activity now use IRI, you should replace URI with IRI everywhere, I think;" seems perfectly correct to me - i.e., it's perfectly applicable right now, today, not matter who says it, or when it was 'originally said'.

So in summary, I see the consistent use of IRI as the only way to prevent the annoying ongoing confusion over URL/URI/IRI, and to stop the cyclic discussion. In other words, this all goes away if all Semantic Web related technical specifications simply followed the example of the foundational RDF Recommendation itself. (Note, they don't have to repeat the Note in section 3.2 IRIs, just reference it).

If you insist on persisting with URL, then I feel you're only continuing the confusion and cyclic discussion (since it obviously misaligns with the extremely clear RDF spec (so you'll always have to explain/justify that misalignment)). And I don't think anybody is going to propose making this consistent the other-way-around - i.e., by changing the RDF Recommendation to use URL (although RDF 1.2 is in the works, so now's your chance to suggest it :smile: !)).

So flip to IRI and be consistent with RDF... ...or stick with URL and continually explain and justify the misalignment with RDF.

pmcb55 commented 11 months ago

Hiya @jacoscaz

The now current URL Living Standard — Last Updated 8 December 2023 says —

Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

Very much +1 on this. Furthermore, within the context of WebID, URL is actually what makes the most sense. WebIDs must dereference to profile documents and therefore must not only be identifiers (IRI / URI) but also locators (URL).

Yeah, the fact that WebID's are really IRLs(!) is indeed a wee bit niggly. But for me, it's still fine to consider them first-and-foremost identifiers. Yep, I do get your point, that they do have to be dereference-able to be 'true WebIDs', but I don't think that justifies breaking from RDF's mandate for IRIs (and I certainly don't want anyone to coin 'IRL' as a new acronym).

So yep, you make a valid point, but for me it's not strong enough to justify diverging from the RDF foundation of IRI.

Based on my personal experience, the "internationalized" characteristic of IRIs is pretty much assumed to apply to URLs, too,

Maybe I'll just point again to "Java doesn't support IRI?" :smile: !

TallTed commented 11 months ago

once-and-for-all resolve it

Nothing is ever "once-and-for-all resolved" in the world of SDOs. It is only resolved for this edition of this spec.

Note that here you pointed me to --

RDF 1.1 Concepts and Abstract Syntax W3C Recommendation 25 February 2014

-- and earlier you pointed me to even older things, and that's part of what got me spinning.

It is probably worth pointing also (if not instead) to that section in the latest Editor's Draft of RDF 1.2 Concepts, which mostly matches that from RDF 1.1. It is also worth noting that we (the RDF-star WG, which is working on all of RDF 1.2 and SPARQL 1.2, and it's a lot) had substantial discussion about this, much (but not all) of which is captured here.

I had forgotten that we did conclude that IRI was the right term for purposes of RDF, though not necessarily for purposes of Web browsing (which the WHATWG is far more focused on).

So anyway. Yeah, go ahead with IRI for WebID.

jacoscaz commented 11 months ago

Yes, it's unfortunate that people might wrongly conflate, or misunderstand, or mixup these terms, but I think it would be a big mistake to attempt to paper-over these distinctions by misusing the term URL in this specification when we all know we really, technically, mean IRI (again, as per the RDF spec).

The way I see it, it's exactly the opposite. By using IRI we'd be papering over the distinction between identifiers and locators in a manner that's guaranteed to be technically confusing. Nobody can really mean IRI if they're describing what must be a locator.

I do agree that deviating from the RDF spec would be unfortunate but, and I acknowledge that there's a lot of subjectivity here, I think doing so would nonetheless deliver a technically clearer and more understandable spec.

Both IRI and URL require a compromise:

going with IRI we would compromise on making it clear that a WebID is a locator in favor of making it clear that the spec supports an extended set of characters
going with URL we would compromise on making it clear that the spec supports an extended set of characters in favor or making it clear that a WebID must dereference

Today, most technical people I work with don't even question whether URLs can be internationalized or not. They just assume that such is the case and get mildly annoyed when it isn't. On the contrary, all of the technical people I work with are very well aware of the difference between identifiers and locators. IMHO, by going with IRI we'd be making the wrong compromise.

TallTed commented 11 months ago

A WebID which Identifies me does not Locate me. Clearly, a WebID is not a Locator, Uniform or otherwise. It is an Internationalized Resource Identifier (which class includes all Uniform Resource Identifiers; all URIs are IRIs) which identifies an Agent. Dereferencing that Identifier leads to a WebID Profile Document, possibly through a 303 or other redirect, the target of which may be (probably is) a Uniform Resource Locator.

jacoscaz commented 11 months ago

A WebID which Identifies me does not Locate me

Agreed. But, for all practical intents and purposes, a WebID identifies the Agent while locating the Profile Document. This is not 100% sound from a theoretical point of view but reflects what happens in practice: if a WebID can be passed as the primary url argument/parameter of an HTTP client library in order to obtain a Profile Document, then that WebID is, by all practical definitions, a locator.

That said, I should clarify that, although I would prefer URL over IRI, I would personally have no significant issue with any of these. Yeah, I think one would be better than the other but there are other hills for me to die on. I'm mostly here because I'm very interested in learning about the perspectives of others.

I think we have +3 for IRI and +1 for URL, at the moment.

jacoscaz commented 11 months ago

Tagging a few of the most active group participants to encourage the conversation - @woutermont @jonassmedegaard @melvincarvalho @kidehen @acoburn

jonassmedegaard commented 11 months ago

I would find it super confusing if WebID were to use "URL-but-a superset-to-include-more-characters".

I much prefer that we continue to define the use of "IRI (or arguably only a subset of IRIs) that is used for locating".

melvincarvalho commented 11 months ago

We've been using URIs for over a decade and it's been fine. URLs seem fine too. Noting, as others have, that the original referenced issue is dated 2012. AFIAK different specs say different things on this. I really dont know about IRIs and if they introduce new attack vectors.

kidehen commented 11 months ago

Agreed. But, for all practical intents and purposes, a WebID identifies the Agent while locating the Profile Document.

That's confusing. A WebID is an HTTP based identifier that unambiguously names an Agent.

HTTP IRIs and URIs (both Pointers) satisfy the condition expressed in the statement above. A URL (an Address) doesn't.

HTTP is what makes a WebID a "deceptively simple" Name.

A Name is a combination of denotation and connotation delivered by realm-specific indirection.

The connotation->denotation switch is crucial, and its happens via indirection.

Trying to ignore these facts simply makes matters confusing, as @TallTed has already explained in detail.

csarven commented 11 months ago

I'm still in the review process of https://github.com/solid/specification/pull/575 ( by @woutermont , whom did a great job). If anyone wants to get a sense of the kind of things that needs to be detailed for a protocol, they may find the PR and discussion there useful. Some of the findings or solutions there (and elsewhere, e.g., rdf12-concepts) are important here as well, and we can probably minimise repetition. I'd like to see/do a diff between the PR and rdf12-concepts.

That's all orthogonal to getting a better sense of implementation experience, or even commitment to implement.

There are separate but related considerations in the WebID spec. At varying degrees, the WebID spec details the identifier, the protocol, and data. So, we need proper roundtripping ( see e.g., table in https://github.com/solid/specification/issues/347#issuecomment-1237167849 )

As I understand things at this time, it is not so much about URI vs IRI or URL or Whatever. Going with IRI or (WHATWG) URL instead of URI doesn't resolve anything out of the box. The important detail is the function that maps one to another so that there implementations can interop, which is at the core of resolving these issues and PRs. IMO.

jonassmedegaard commented 11 months ago

@melvincarvalho I think it would help the conversation that you clarify if by "we" you mean all of us or some smaller group you are part of that simply haven't happened had any need for involving data at internationalized domains.

jacoscaz commented 11 months ago

Feels like there's a strong consensus for using IRI - very good! Consensus is the key!

Antoine-Zimmermann commented 11 months ago

As I have been mentioned in this thread, I will give my opinion.

First, the remark I made back in 2012 is still valid and appropriate: the main Semantic Web standards all use IRI as the identification model: OWL 2, SPARQL 1.1, RDF 1.1, SHACL. Some standards that are based on RDF are more sloppy and inconsistent in how they use URI and IRI but for these standards, the notion of identification is not at the center of their models. For WebID, identification is very much the central focus of the specification. So, this is one point arguing for consistency and homogeneity across specifications.

Second, IRIs are for internationalised identification. Constraining characters to those used by the English language is rather arbitrary and pushes away many civilisations that want an understandable Web for them. Indeed, the W3C Group Draft Note on Internationalization Best Practices for Spec Developers has gone through many revisions but consistently said, since 2016: "Specifications that define resource identifiers MUST permit the use of non-ASCII characters." And below the sentence is written: "Model is defined in terms of IRIs; Protocol with URI."

Third, IRIs correspond better to what WebID is specifying than living standard URLs. URIs, IRIs, and even RFC URLs are not even living standard URLs. In the living standard, URLs are structs, not character strings. An RFC URL apparently corresponds to a serialized URL. An IRI apparently corresponds to a valid URL string.

As a side note, in my opinion, the URL living standard is creating confusion rather than solving it. It is horrible to read, very complicated and disconcerting. It reinvent its own terminology rather than building on top of the existing ones. It may be creating more problems than it is trying to solve. Also, on the point of being able to locate information, which is needed for WebID: some IRIs dereference to a location with information, and some don't. But URLs are not a solution in this regard: some URLs dereference to a location with information, and some don't.

So, I reiterate my claim that WebID should use IRIs as its base model for identification, not URIs nor URLs.

Antoine-Zimmermann commented 11 months ago

@csarven

As I understand things at this time, it is not so much about URI vs IRI or URL or Whatever. Going with IRI or (WHATWG) URL instead of URI doesn't resolve anything out of the box. The important detail is the function that maps one to another so that there implementations can interop, which is at the core of resolving these issues and PRs. IMO.

You're right. The change from URI to IRI is only a small conceptual fix so as to allow people for whom the Latin alphabet means nothing to use their own meaningful symbols. It adds another layer to the mapping from identifier to data, namely that Unicode must be translated to ASCII before resolving the mapping.

melvincarvalho commented 11 months ago

-1 to IRIs for me, for now

Not in this version, as we already have a lot to take on, and this would add a whole new world of complexity, serialization problems, equivalence problems, security problems. Test suites will be that much harder. There will be a whole range of edge cases and phishing attacks.

The scope is too broad, and I would prefer to focus on the main work item of a subset spec and a superset spec, which will be hard enough to get over the line.

Also there should be a survey by those advocating this of the social web, and how much of the social web are already using IRIs today, and interop with them. If there's an overwhelming adoption that should help the case. But still I think it should go into a WebID v next.

Also special characters belong in a foaf:name, not in a URI.

Edit: given that this issue has not moved forward in over 11 years, I think it would be an unnecessary risk to move it into the critical path

Antoine-Zimmermann commented 11 months ago

@melvincarvalho, your position makes complete sense, and I sympathise with these arguments.

However, saying that "special characters belong in foaf:name, not in a URI" is somewhat ~disingenuous~ injudicious since indeed, special characters do not belong to URIs by definition of the RFC. URIs are ASCII strings. But your claim, I think, should be read as "in a foaf:name, not in an identifier". Web standards disagree with you all over the place: RDF 1.1, OWL 2, SPARQL 1.1, SHACL all are based on IRIs with a wide range of Unicode strings. Even the WHATWG living standard URL allows a vast range of Unicode strings. And obviously, the W3C Internationalization Working Group and Interest Group disagree.

[EDIT: there was an imprudent use of the inappropriate term "disingenuous"]

woutermont commented 11 months ago

Hi all! Seems that I have missed a lot of back-and-forth here. Most of what has to be said is nicely summerized by @Antoine-Zimmermann in his comment. I'm going to add some in-depth points, but here's the TLDR:

On the one hand, IRI is the way to go for semantic specs (e.g. RDF et al.), and this poses no big problems: the URI<->IRI mapping takes care of HTTP-compatibility and upgradeability (backward compatibility), and well-defined normalization can take care of security issues and compatibility with WHATWG URL. On the other hand, WHATWG URL (sadly) has become the way to go for specs using browser-related dereferenceable identifiers (e.g. HTTP). These specs show a great deal of overlap, but their goals are actually orthogonal to each other. Therefore, nothing prevents us from specifying a WebID as both an IRI and a URL with the HTTP scheme.

Let's start with the most important destinction to keep in mind in this discussion:

URI and IRI are specs that define the syntax of these identifiers, i.e. their (extendable) formal language. They are elegant, versioned standards with concrete and workable grammars. The semantics of each identifier is left to the extending schemes. For example, HTTP defines URIs that stand for resources managed by the HTTP protocol.
WHATWG URL is ... none of that. It is a best-effort to align the historical deviations from URI/IRI by popular browser implementations, as a base for the WHATWG Fetch spec. It is convoluted, cumbersome to work with, has no declarative grammar, and follows a 'live' (versionless) evolution. Moreover, all decisions are made by a handful of people representing the oligopoly of major browsers, and external proposals are notoriously often denied. But this is not the issue. The spec is what it is, and it is followed by numerous libraries and applications. However, the reason why it can not and should not ever replace URI/IRI is that it is precisely NOT an extendable syntax with scheme-dependant semantics: regardless of how broad it proclaims to be, the only schemes WHATWG URL is really concerned with are those that are "fetchable", and the only semantics it is concerned with is the procedural one dictated by the WHATWG Fetch spec.

This difference is important in deciding whether to use URI, IRI and/or URL. For example, it would not make sense for RDF to use URLs, rather than IRIs, since RDF needs identifiers that can stand for anything, and does not (and should not) concern itself with their "fetchability", and thus neither with the convoluted procedures that come with that minority of URL schemes. I'm glad that @TallTed has also realised that:

[@TallTed:] [The RDF-star WG] had substantial discussion about this, [and] did conclude that IRI was the right term for purposes of RDF, though not necessarily for purposes of Web browsing (which the WHATWG is far more focused on).

I therefore fully agree with @pmcb55: conflating URI/IRI with URL disregards this fundamental different purpose between RDF and Fetch.

Keeping that distinction clearly in mind, though, it is also important to stress that this is not an "either/or" situation. If it is the core of a WebID to dereference to its corresponding Document, it should probably be a URL. If it is a characteristic of a WebID that it can also be used as a node in RDF, it should probably also be an IRI. These things do not contradict each other, and I personally think they are both important aspects of a WebID. Moreover, one does not needs complex terminology to define this (as @jonassmedegaard seems to fear): we can literally say that a WebID is both an IRI, and a URL, with the HTTP scheme.

Note that we do not necessarily need to say much more about this. We can add some stricter normalization (e.g. as I proposed for the Solid spec) for security, and we can stress that all IRIs dereferencing to the same Document denote the same Agent (which is otherwise not entirely clear in an RDF context). But apart from that, this is a lot of fuss for something that will hardly have an impact (except for allowing additional characters, of course): I honestly have not seen a single non-URL URI/IRI actually being used; has anyone?

csarven commented 11 months ago

Current definition: WebID uses HTTP URI; WebID Profile Document uses HTTP URL.

More specifically, WebIDs without a fragment identifier needs to be dereferenceable, an HTTP URL (as per 303 requirement).

Specification Orthogonality and Variability becomes relevant:

For specifications to build on the WebID specification, it is useful to not impose tight coupling of Identification with Interaction - I mean behaviour, and not splitting specifications as a way of resolving that concern. Implementations can then use WebID for identification without having to also implement the interaction. Certainly the latter paves the way to Linked Data, and not something that's necessarily isolated to RDF or wherever else a WebID identifier may appear or is used. WebID being an HTTP URI in and itself hints at a potentially dereferenceable resource.

So, the consideration to change from URI to IRI in the specification doesn't have much (or any?) impact on WebIDs with fragment identifiers, but rather impacts WebIDs without fragment identifiers and WebID Profile Documents.

I'd advise against merging the concepts for identification and interaction where IRI/URL need to be expressed, regardless of the potential adoption of IRI in the specification.

melvincarvalho commented 11 months ago

However, saying that "special characters belong in foaf:name, not in a URI" is somewhat disingenuous

Please retract this statement, I dont find this acceptable @Antoine-Zimmermann

Antoine-Zimmermann commented 11 months ago

Melvin, I am sincerely sorry for the inappropriate term. Alas, I had a misunderstanding of the meaning of the term and did not check its definition. I had no intention to mean that you have secret and deceitful motives. Perhaps the words that I was looking for were "somewhat injudicious". I hope this term "injudicious" is not too strongly negative for you and I meant to tone the phrase down by the use of "somewhat".

In all honesty, I regard your opinion as very valuable and I find it sane that you openly expressed your resentment towards this term. For this reason I deeply apologise.

jacoscaz commented 11 months ago

/chair hat on

To all: let's try to give the benefit of doubt before asking for retractions or assuming the worst in people. Doing so is not conductive to a peaceful working environment. If a specific formulation stands out as too harsh, perhaps let's kindly point it out first, asking for clarification. I, myself, was totally unaware of the negative aspects of the word disingenuous, which I had only ever seen used as a synonym of unaware or uninformed. Clearly, I was uninformed about its other meanings! And, in a more general sense, I'm uninformed about most things in life, really.

Insofar as possible, let's assume good faith on everyone's part.

melvincarvalho commented 11 months ago

@jacoscaz I appreciate your intention to foster a positive dialogue in our recent exchange with @Antoine-Zimmermann . However, I must express my concern regarding your message's implications.

Using "disingenuous" in our discussions is serious, as it conveys dishonesty or a lack of integrity, which is why I sought a retraction from @Antoine-Zimmermann . While your message aimed to promote understanding, it seemed to undermine the gravity of such accusations, which are critical in a professional setting like W3C where respect and integrity are essential.

I urge you to recognize that professional decorum is vital for a respectful working environment. Equating terms like "disingenuous" with being uninformed can diminish their impact and the group's atmosphere.

I request a reconsideration of your statement to better reflect the balance between good faith communication and maintaining professional standards.

Thank you for considering this perspective. I remain committed to a constructive and respectful dialogue for the benefit of our work.

jacoscaz commented 11 months ago

/chair hat on

it seemed to undermine the gravity of such accusations

I do not mean to sidetrack the conversation, so I'm only going to reply to this topic once. Antoine extended their apologies after recognizing that they were uninformed about the complete meaning of the word disingenuous, as they should have. I'm not saying this should not have happened. Also, I'm not saying that the accusation of being disingenuous is a shallow one. On the contrary, now that I myself have learnt more about the word, I recognize that it is quite grave.

What I am saying, though, is that there's different ways to react when one feels accused of something. Another way could have been Antoine, please clarify what you mean by disingenuous? It comes across as a strong accusation and feels like a personal attack, which accepts that a person might have made a mistake in good faith. English is not my first language and I'm 100% sure I have made much worse mistakes. I'm pretty sure I have made this one, too, as it's very easy as an Italian to pick up disingenuous without fully understanding its implications. Antoine likely speaks French as their first language - they might have made the same mistake.

To sum it up... When there's a reasonable chance of a misunderstanding, which is often the case, ask first. That's it.

jacoscaz commented 10 months ago

/chair hat off

It always strikes me when a little bit of lateral thinking provides options that, in retrospect, should have been obvious.

I'd advise against merging the concepts for identification and interaction where IRI/URL need to be expressed, regardless of the potential adoption of IRI in the specification.

@csarven I think I understand what this means but could you make an example of what you would not merge and how you'd rephrase it?

we can literally say that a WebID is both an IRI, and a URL, with the HTTP scheme.

@woutermont so far, this would be my favorite option as it's the most explicit of them all and leaves little unaddressed.

But apart from that, this is a lot of fuss for something that will hardly have an impact (except for allowing additional characters, of course): I honestly have not seen a single non-URL URI/IRI actually being used; has anyone?

I agree. For me there's no blocker here, just preferences. I used to prefer URL but now I think my first option would be IRI+URL, then URL, then IRI. I'm also interested in Sarven's point.

melvincarvalho commented 10 months ago

I'm only going to reply to this topic once ...

@jacoscaz thank you for this wonderfully thoughtful response. All the more considerate, given that English is not your first language.

First of all, most of the points you raise had already occurred out of band. I had a wonderful conversation with Antione before you said anything on this thread. It was amicable and professional, and indeed we were able to discuss several nuances on URIs vs IRIs, where I think we both learnt something.

On the term used I did ask it to be retracted from the public record, for reasons that I think are now understood. I did not insist on an apology but Antoine graciously offered one, and immediately corrected the record. That is over and above what I had asked from him, and I can say he was professional and gracious throughout. Not only did we successfully deescalate the situation, but I think our working relationship was actually improved, as a result.

I do appreciate you offering your wisdom here. But in truth you need not have intervened, as the situation was already resolved. Antoine retracted the term and made replied with kind comment, which I then upvoted. Thank you for caring about the community and taking time out of your day, that shows that you take pride in your work, and illustrates the care you have towards the community.

In this case, however, it was not necessary, but you didnt know as you were not privy to our private communications. The community comes out stronger, for it, and becomes a more respectful and thoughtful place.

IMHO a textbook example of escalation management and conflict resolution.

I thought I would just clarify this. Back to topic.

jacoscaz commented 10 months ago

/chair hat on

Therefore, nothing prevents us from specifying a WebID as both an IRI and a URL with the HTTP scheme.

Any strong objections to this approach?

RubenVerborgh commented 10 months ago

Hi @jacoscaz, all,

After reading through this thread, I agree with the intent of the above approach. But while I agree with the intent, it might not work as written in isolation (i.e., without the important context provided by @woutermont and others).

IRIs are, by definition, a superset of URIs with a wider character set (RFC 3987) than the US-ASCII in URLs, so the short statement creates a contradiction in trying to capture the discussion.

Suggestion: can we separate the normative and non-normative language, to still capture everything but without that contradiction? Normative: A WebID is an IRI. Non-normative: A WebID can be generally considered a WHATWG URL with a HTTP(S)/file:/ftp:/mailto:/[Browser-Supported] scheme, although implementers should prepare for the entire IRI space.

jacoscaz commented 10 months ago

/chair hat on

Hello @RubenVerborgh ! Nice to see you around here!

/chair hat off

That seems reasonable. I'm unsure as to whether it would create confusion in those who are not aware of IRI vs. URI. vs. URL etc... but I think that some confusion in their case is inevitable. I like it!

hzbarcea commented 10 months ago

My $0.02.

To access data one will need, at some point, an address, meaning a location, meaning a URL. No way around it (unless you have a copy of the data already). URLs are a subset of URIs. So if the standard requires an URI/IRI (personally I am neutral on that), then the standard MUST say something about (at least the existence and necessity of) a resolution mechanism from a non-URL URI, which pretty much means a URN, to a URL. Using a URL as a URI, pretty much implies the lazy identical transformation as a resolution mechanism. But that should be more of the exception than the norm.

TallTed commented 10 months ago

Non-normative: A WebID can be generally considered a WHATWG URL with a HTTP(S)/file:/ftp:/mailto:/[Browser-Supported] scheme, although implementers should prepare for the entire IRI space.

Note that one of the significant blocks to consensus on the 2014 ED was the requirement that WebID URIs be in the HTTP(S) scheme.

Since at least 2014, OpenLink Software has been using NetID to refer to a superclass of WebID, with NetID supporting URIs of many schemes. For more, see A Quick Note on WebID history, particularly these historical perspective links therefrom —

NetID and NetID-TLS Presentation (circa 2014)
NetID Entry in W3C RWW wiki
Tweets about NetID, starting in 2014

Bottom line here, the "Non-normative" snippet above describes a NetID, not a WebID, unless we do some more radical reworking of the existing spec of WebID.

jacoscaz commented 10 months ago

/chair hat on

To all, let's keep this discussion focused on URI vs. IRI vs. URL . WebID is HTTP(S) only, for now, but we can discuss in a separate issue if anyone wants to do so.

kidehen commented 10 months ago

If a WebID has been established to be an HTTP URI that names an Agent.... Isn't it logical to presume that IRI introduction simply implies:

A WebID is an HTTP IRI that names an Agent...

Have I overlooked something here?

SeeAlso:

CoPilot (i.e., ChatGPT lookup)

melvincarvalho commented 10 months ago

Non-normative: A WebID can be generally considered a WHATWG URL with a HTTP(S)/file:/ftp:/mailto:/[Browser-Supported] scheme, although implementers should prepare for the entire IRI space.

Hi @RubenVerborgh. Nice to see you here & thanks for chiming in.

I think you've hit on a critical thing. @timbl has repeatedly said that the file and http spaces (and ftp) are part of the web. It's a subtle point, but one with immense power. Regarding mailto: I see it as an important massively deployed scheme, but somewhat different in terms of ability to derference it.

There is lots of merit and utility to this broader world view, that is not communicated in WebID-* type specifications.

NetID Entry in W3C RWW wiki

Even though I co-authored this entry while chair of the RWW group NetID is not a web standard, it's IMHO an OpenLink initiative.

It also has a name clash with several other projects, are you sure it's not trademarked?

https://enid.foundation/en/

melvincarvalho commented 10 months ago

Have I overlooked something here?

Yes, if there's an intention to change or expand the well-established definition from it's original HTTP URI (and personally I dont think now is the time to do that), that opens the consideration WebID could be generally considered a WHATWG URL.

kidehen commented 10 months ago

Yes, if there's an intention to change or expand the well-established definition from it's original HTTP URI (and personally I dont think now is the time to do that), that opens the consideration WebID could be generally considered a WHATWG URL.

There is no such intention.

It would be helpful if you could articulate how an HTTP IRI and an HTTI URI differ, in a manner that extends beyond what I articulated in my response.

kidehen commented 10 months ago

NetID is not a web standard, it's IMHO an OpenLink initiative.

Who said (or inferred) that it was a Web Standard?

@TallTed was clarifying the context around the use of "NetID" as the term we (OpenLink Software) employ in our products for Agent Naming, which encompasses more than just HTTP URIs.

Expanding Agent Naming to include identifiers beyond HTTP URIs necessitates the creation of lookup handlers for each protocol scheme. That's the cost an implementer incurs when seeking to integrate other schemes into the World Wide Web.

kidehen commented 10 months ago

@timbl has repeatedly said that the file and http spaces (and ftp) are part of the web.

Where and when did @timbl say that? Note, there's a significant difference between "they can be part of the World Wide Web" or "integrated into the World Wide Web" and "they are part of the World Wide Web".

melvincarvalho commented 10 months ago

actually file: space is also the web

Timbl said this to us in 2017

It's recorded in the gitter archive, solid chat, April 11 2017

He's said it im sure in other places, but if you doubt that, ask him. Be aware that I've been in 100+ of meetings with timbl and 100+ chats. Followed mailing list conversations and so on. If you doubt that the file space is part the web ask him. if you don't believe it or don't get it i don't have the time to try to convince you sorry

melvincarvalho commented 10 months ago

if there's an intention to change or expand the well-established definition from it's original HTTP URI

There is no such intention.

It might help if you clarified this.

Are you saying there is no intention to change or expand the definition of WebID from its original as an HTTP URI?

Will the definition change or will it not change?

melvincarvalho commented 10 months ago

Originally recorded/tracked in https://www.w3.org/2005/Incubator/webid/track/issues/71

Considerations (requirements, advisory..) to include in the WebID 1.0 spec.

@csarven would you consider perhaps holding off on this change, until the next minor upgrade, and when we get a stable spec with new stable definitions. I think the URI vs IRI conversation would be easier, at that point.

jacoscaz commented 9 months ago

/chair hat on

It's fairly important for the conversation to remain focused and not expand towards concerns that are orthogonal to the topic that this issue focuses on. Regardless of the scheme / protocol, this issue deals with the difference between IRI, URI, URL (in the W3C sense) and URL (in the WHATWG sense).

Whether a non-HTTP(S) IRI / URI / URL can be a WebID or not is a different topic, which I kindly ask everyone to address in a separate issue.

jacoscaz commented 9 months ago

/chair hat on

Does anyone object, strongly or not, to @RubenVerborgh 's proposal in https://github.com/w3c/WebID/issues/10#issuecomment-1932441361 , which I quote below with a slight edit to limit the scope of this issue to the URI vs. IRI vs. URL debate?

Suggestion: can we separate the normative and non-normative language, to still capture everything but without that contradiction? Normative: A WebID is an IRI. Non-normative: A WebID can be generally considered a WHATWG URL with a HTTP(S) scheme, although implementers should prepare for the entire IRI space.

jacoscaz commented 9 months ago

/chair hat on

Note that this could be easily addressed by using the normative and non-normative language proposed by @RubenVerborgh and referring to https://www.w3.org/TR/rdf12-concepts/#section-IRIs for anything not covered here. We should try not to duplicate things that are already covered by more foundational specs.

w3c / WebID

Switch from URI to IRI terminology in WebID spec #10