w3c / WebID

https://www.w3.org/groups/cg/webid
MIT License
14 stars 7 forks source link

Serialization: some formats or no formats at all? #61

Closed jacoscaz closed 5 months ago

jacoscaz commented 7 months ago

/chair hat on

The conversation that started from https://lists.w3.org/Archives/Public/public-webid/2024Feb/0005.html, which I'll generally refer to as Martynas's proposal (@namedgraph) for dropping any kind of MUST on serialization formats following the principle of orthogonality, indicates that there's significant interest towards this approach.

As chair, I need to:

  1. Understand how much consensus there is around this proposal.
  2. Compare the consensus with the MUST on JSON-LD and Turtle for publishers, which is currently the option with the largest consensus when it comes to serialization formats.
  3. Foster technical discussion around these two proposals so that each "side" understands the other and each one of us gets to have a chance at changing their mind or further solidify their position.
  4. Give everyone enough time to make their case, particularly to those that struggle to follow the conversation due to the volume of notifications and discussion threads that this group can produce.

To this end, I kindly ask you to share, in this issue, which of the two you believe to be the best way forward and to ground your view in technical arguments and examples. Some of you have already done this elsewhere, in which case I would kindly ask you to copy/paste here (and add as much as you want to it).

Limited to this issue I'm going to try something new and be rather ruthless as to how I manage the conversation. At least initially, I need each one of you to only comment once and to make your case independently of others. Do not respond to others, just make your case in one, single comment. I will stop moderating (hiding / deleting) comments that do not follow this rule once reasonably sure that everyone will have had the chance to put their argument forward. Let's say... Two weeks. I will stop moderating comments in two weeks from now, on 2024-02-23.

Unorthodox approach, for sure, but I can clearly see this specific conversation spiraling out of control in no time at all.

melvincarvalho commented 7 months ago

rather ruthless ... Unorthodox approach ... Do not respond to others

I appreciate the structured approach you’re proposing to navigate the discussions surrounding Martynas's proposal and the existing consensus on serialization formats. It's clear that managing a balanced and focused conversation in such a vibrant community presents significant challenges.

However, I'm concerned that the suggested method, particularly the limitation on discussion and the moderation strategy, might inadvertently stifle the open exchange of ideas that is fundamental to the W3C's ethos. By restricting participants to a single comment without the opportunity for interaction, we risk overlooking the dynamic nature of consensus-building, where understanding evolves through dialogue.

While I understand the intention behind this approach is to maintain order and clarity, it feels somewhat at odds with the principle of fostering an inclusive and comprehensive debate. The approach, as outlined, might inadvertently act as a form of hard forking the project, that has been stable for many years, by curtailing the discussion, potentially leading to decisions that don't fully reflect the community's collective wisdom.

I respectfully suggest reconsidering this strategy in favor of one that allows for more fluid interaction, ensuring all voices are heard and considered more organically. Perhaps a compromise could be found that still addresses the concerns of discussion manageability without significantly constraining dialogue.

Thank you for considering this perspective.

rubensworks commented 7 months ago

IMO the WebID spec should be orthogonal to what RDF serialization is used.

The initial HTML spec writers could have required images to be either JPG or BMP for making it easier for browser implementers to handle and render images, but this would have inhibited innovations such as PNG and WebP. Similarly should WebID not enforce JSON-LD or Turtle.

jacoscaz commented 7 months ago

/chair hat off

Personally, I see the value in both approaches. Moreover, practically speaking, I believe they lead to very similar scenarios.

Due to the greater convenience of it, whether perceived or real, the transition towards JSON-LD can't be stopped, just as there was no stopping the much greater transition to JSON (even in cases in which XML would have actually been better, ironically). Ultimately, all specifications and implementations of WebID will be, at any point in time, subject to developer pressure to adopt whatever format is thought to be the most convenient and has achieved widespread usage.

As of today, even if WebID were to be completely agnostic WRT serialization formats, the current state of the world (see here and here) indicates that Turtle and JSON-LD would nonetheless be present in the vast majority of implementations. Turtle looks at the past, JSON-LD looks at the present.

I always find myself thinking of the following point by @kidehen, in this case quoted from https://lists.w3.org/Archives/Public/public-webid/2024Feb/0051.html :

The crux of the issue is that specifications thrive as retrospective standardization of existing market trends rather than prescriptive mandates aimed at shaping emerging or yet-to-be-established markets.

With this in mind, whether to have the MUST on Turtle and JSON-LD for publishers or not seems less consequential than I once thought. Ultimately, the answer is highly correlated to the relative importance of broad and practical interoperability and orthogonality, two things that to me are pretty much equally important. Slightly higher priority to the former leads to a MUST that will need to be updated over time to match where developer pressure is leading the market. Slightly higher priority to the latter leads to no MUST at all.

From a testing standpoint I can't see any difference. I do think we'll have to craft a "WebID Validator", at some point, but such a tool can support as many formats as we want it to.

From a breaking change standpoint, both approaches entail a discontinuity with the past. Again, not much difference between the two.

Ultimately... I think I might prefer no MUSTs at all (maybe with a SHOULD on JSON-LD). But it's really close.

kidehen commented 7 months ago

From a testing standpoint I can't see any difference. I do think we'll have to craft a "WebID Validator", at some point, but such a tool can support as many formats as we want it to.

Yes!

We (LOD Community) had to deal with the same issue regarding the Linked Open Data (LOD) Cloud and the requirements for Linked Data Principles adherence, which lead to the Vapour Validator.

A similar tool would be needed for validating:

  1. WebID (an HTTP URI that names an Agent unambiguously) by performing HTTP URI de-reference and lookup of the a term (or terms) that infers a foaf:Agent instance

  2. WebID-TLS Authentication Protocol

  3. Other WebID-{whatever} Protocols, as they emerge

namedgraph commented 7 months ago

So I need to go with dropping any media type requirements :) I vote for removing mentions of any specific media types and following the W3C specification orthogonality principle.

jonassmedegaard commented 7 months ago

I favor "dropping any kind of MUST on serialization formats following the principle of orthogonality" over "MUST on JSON-LD and Turtle for publishers".

I see no need for WebID spec need to guarantee interoperability - that is a feature for extension protocols like WebID-TLS and Solid OIDC to (optionally!) provide.

TallTed commented 7 months ago

Today, I'm OK with mandating "an RDF document", because today there are libraries which can be incorporated into client and server software with relative ease, which libraries did not exist in 2014. Interop in 2014 required mandate of lowest common denominator serialization, that being Turtle. Interop today can be achieved by generically mandating any serialization of RDF, which both client and server software can translate to any other RDF serialization (or graph store) they may prefer.


The initial HTML spec writers could have required images to be either JPG or BMP for making it easier for browser implementers to handle and render images, but this would have inhibited innovations such as PNG and WebP. Similarly should WebID not enforce JSON-LD or Turtle.

I think it important to note that HTML has always mandated browsers to "fail elegantly" when encountering media types and HTML tags that are unfamiliar or entirely not understood -- so unknown image formats might be downloaded to the browser's host filesystem, or unknown image formats might be displayed as a generic "image" icon in the rendered web page, among other options, to await for further action by the user (e.g., opening the downloaded file with a dedicated image processor that understands more formats than the browser).

Note that this elegant failure of HTML relies, at some point, on human action.

In contrast to HTML, WebID Profile Documents have always been intended to be machine-processable, optimally in addition to being human-processable. Circa 2014, there was insufficient software to provide interoperable machine processability across many, if not most, if not all, possible RDF serializations. Circa 2023, things have changed, and RDF serialization translation can generally be assumed (though it is still not guaranteed), so it seems viable to me to go with —

kidehen commented 7 months ago

Correct!

A spec of this kind is simply supposed to provide guidance which can be embellished via examples. Anyway, at this juncture, I continue to encourage @jacoscaz to organize a vote to determine consensus.

Ultimately, I sense we can only compromise to move forward, which I will live with -- if need be.

kidehen commented 7 months ago
  • WebID Profile Documents MUST be RDF-based

  • WebID Profile Documents SHOULD be made available at least as Turtle and JSON-LD when these are requested by a consumer via Accept: header

  • WebID Profile Documents SHOULD additionally be made available in any other RDF serialization requested by a consumer via Accept: header

  • WebID Profile Documents MAY additionally be made available in any other RDF serialization requested by a consumer via Accept: header

Yes, each of those items being only loosely associated with an HTTP URI that names an Agent, unambiguously (i.e., "a WebID").

Others: It is very important to understand (and accept) that "RDF" is a Data Definition Language that's loosely associated with expression notations, data serialization formats, and concrete syntaxes.

We MUST get away from the legacy problem of conflating RDF and any one of the aforementioned which has been an issue since its incepetion -- exemplified by its unfortunate tight coupling with RDF/XML

kidehen commented 7 months ago

vote for removing mentions of any specific media types and following the W3C specification orthogonality principle.

Your intentions are clear, but I encourage letting @jacoscaz organize a formal vote :)

jacoscaz commented 7 months ago

/chair hat on

I will comment off-topic and then auto-hide my own comment: @kidehen a vote might not even be necessary, let's see how this goes. I want more people to chime in before moving forward. But yeah, if we can't manage to find common ground with this I'm going to call for a formal vote.

jacoscaz commented 6 months ago
  • WebID Profile Documents MUST be RDF-based
  • WebID Profile Documents SHOULD be made available at least as Turtle and JSON-LD when these are requested by a consumer via Accept: header
  • WebID Profile Documents SHOULD additionally be made available in any other RDF serialization requested by a consumer via Accept: header
  • WebID Profile Documents MAY additionally be made available in any other RDF serialization requested by a consumer via Accept: header

@TallTed would you be comfortable with moving forward in two separate steps, first dropping all kinds of requirements (MUST, SHOULD and MAY) on formats and then discussing and introducing optional requirements (SHOULD and MAY) in separate issues and PRs?

EDIT: hah, I ironically forgot my own rule of not responding to others until 2024-02-23 . Self-flagging as off-topic.

kidehen commented 6 months ago

For the record, once again.

RDF specs shouldn't be hardwired to any notation, serialization format, or concrete syntax.

We will always support not having MUST applied to any notation, serialization format, or concrete syntax to the WebID Identity and Discovery Specification.

RubenVerborgh commented 6 months ago

Conclusion, if you want actual usage:

  1. Mandate one format (others remain allowed anyway).
  2. If that format is JSON-LD, also mandate a specific frame.
TallTed commented 6 months ago
  1. If that format is JSON-LD, also mandate a specific frame.

I think this is an argument for the one mandated format to be (to remain) Turtle, because that's a full specification of that mandate; there's no additional frame to worry about.

JSON-LD remains available via conneg and easily integrated server-side and client-side RDF trans-serialization libraries.

namedgraph commented 6 months ago

Honestly I'm confused by OpenLink's position.

First @kidehen seems to support my proposal:

We will always support not having MUST applied to any notation, serialization format, or concrete syntax

Then just a few comments later @TallTed writes:

I think this is an argument for the one mandated format to be (to remain) Turtle

Aren't these the opposite suggestions? Which is it?

TallTed commented 6 months ago

@namedgraph — I have not been speaking here as or for OpenLink. I have been and will continue speaking for and as myself, unless I say otherwise. This is the way most if not all people operate in most if not all of the W3C groups in which I participate.

jacoscaz commented 6 months ago

/chair hat on

As established in the opening comment, strict moderation stops today and this issue is now open to normal discussion and debate, though there's a few people whose feedback is still missing.

After spending a lot of time going through the comments in this thread, the ones in #3 , multiple threads of conversations on the mailing list and half a dozen private conversations with some of you, my preliminary conclusion is that the best way forward is to weaken the current MUST on Turtle into a SHOULD on Turtle:

As for the rationale behind this:

  1. The largest consensus that I can see lies with dropping all requirements (as in MUST) on serialization formats, favoring orthogonality. This is even larger than it may come across as a few members actually favor this option in the long term while preferring a more conservative stance in the short term due to interoperability concerns.
  2. The second largest consensus that I can see lies with one MUST on a format if requested via the Accept: header, doesn't really matter which specific format, favoring interoperability. This is also informed by the view that most users and developers will interact with WebID parsing and serialization only indirectly.
  3. While a breaking change is needed, this is arguably the least breaking change that can be made while still opening up the spec to JSON-LD and, indeed, any other format.
  4. In addition to point 3., Turtle is arguably the least complex and onerous format to support from a technical standpoint. If a soft requirement (SHOULD) must be made for interoperability, a simpler format increases the chances that implementations will actually stick to such requirement.
melvincarvalho commented 6 months ago

and half a dozen private conversations with some of you

IMHO, consensus should be exclusively built on transparent, public discussions to align with W3C's commitment to clear and open decision-making processes. This becomes even more pertinent when we face contentious breaking changes and the prospect of hard forks.

jacoscaz commented 6 months ago

/chair hat on

consensus should be exclusively built on transparent, public discussions

@melvincarvalho and all,

Agreed! Which is why #66 favors orthogonality / neutrality over interoperability.

Nonetheless, as I've explained in https://lists.w3.org/Archives/Public/public-webid/2024Feb/0067.html , some of the dynamics in this group tend to push interested and invested parties away, discouraging participation. It is imperative for the chair - whoever they might be - to counteract the group's tendency towards insularity or WebID will become irrelevant. To that end, I maintain private conversations with those who do not believe participating in this group to be a productive endeavor, hoping that the group as a whole can and will prove them wrong.

Just as an example, @rubensworks has recently pointed this out in https://lists.w3.org/Archives/Public/public-webid/2024Feb/0047.html , with @kidehen rightfully suggesting to reassess consensus shortly thereafter in https://lists.w3.org/Archives/Public/public-webid/2024Feb/0052.html in light of the growing participation from other voices . In turn, this got me to create this very issue, from which #66 was born.