w3c / WebID

https://www.w3.org/groups/cg/webid
MIT License
14 stars 7 forks source link

proving that conneg is a breaking change #58

Closed melvincarvalho closed 5 months ago

melvincarvalho commented 6 months ago

There are two Webid Specs, WebID 1.0 and WebID 2.0.

WebID 1.0 mandates a serialization, say Turtle.

WebID 2.0 mandates conneg, with Turlte and JSON-LD

Alice has a WebID alice.ttl which has been her webid for 10 years. She uses it with all her solid Apps.

One day Apps switch to WebID 2.0 and the newer JSON-LD becomes the preferred serialization

Alice's WebID is now broken. Furthermore it's hosted on her own home page using Apache, and she has no ability to fix her webid. This breaking change, makes all of Alice's work over the last 10 years potentially broken.

QED

woutermont commented 6 months ago

Really, @melvincarvalho? This is not an issue. @jacoscaz, I suggest moving this to Discussions, which we specifically opened for things like this.

I also think that we already had this discussion good and well in issue #3 regarding the serialisation formats. Before repeating ourselves here, I would thus suggest you (re)read the past months of that issue. In particular, I explained at length during multiple comments (here, here, here, here and here), why switching to Turtle+JSON-LD using conneg would not break anything the WebID spec aims to support. In an effort to be constructive, I will try my best to summarize those points here as clear and succinct as I can (parts of the following text is copied from my aforementioned comments).

  1. The vast majority of WebID users will be non-technical people, managing their WebID through the GUIs of WebID providers. This includes Alice and Bob, our main fictitious users.

  2. All existing WebID providers (Inrupt, Use.ID, OpenLink, solidweb.me, redpencil.io, inrupt.net, solidcommunity.net ...) already provide Turtle+JSON-LD through conneg. (If I missed some, please tell me.) This is no wonder, because conneg is part of all major web frameworks, and no work at all to add to a server.

  3. Given the above two points, conneg does not break anything for the vast majority of WebID users, and is thus perfectly in line with the main targets for a successful WebID ecosystem.

  4. Now, if you really want to be nitpicky, there are indeed a few people who currently host a static WebID Document. They do so simply because they can: it has up till now been the simplest way to host a WebID. However, people that are skilled enough to host a WebID Document themselves are easily skilled enough to set up conneg over it. I've timed it: it takes about 5 minutes using a free CloudFlare account. I can guarantee you that the majority of these people will do so the instant we add conneg to the spec. Most importantly, however, these these people are not the main target for the spec. They are not the ones for which applications will be built and an ecosystem will grow; they are not the ones to which we must cater.

melvincarvalho commented 6 months ago

Thanks for this, replies inline. I dont think discussions are universally liked, so perhaps lets keep it as an issue, for a bit:

  • The vast majority of WebID users will be non-technical people, managing their WebID through the GUIs of WebID providers. This includes Alice and Bob, our main fictitious users.

Doesnt really speak to breaking changes

  • All existing WebID providers (Inrupt, Use.ID, OpenLink, solidweb.me, redpencil.io, inrupt.net, solidcommunity.net ...) already provide Turtle+JSON-LD through conneg. (If I missed some, please tell me.) This is no wonder, because conneg is part of all major web frameworks, and no work at all to add to a server.

Not the example I gave of alice.ttl -- there's more than one such webid around. So this is not a disproof

We dont even have a way to test that conneg works on the servers above. Trust me, I've spent hours trying to design such a test with the "solid team". They dont even understand conneg.

  • Given the above two points, conneg does not break anything for the vast majority of WebID users, and is thus perfectly in line with the main targets for a successful WebID ecosystem.

Again, the proof stands.

  • Now, if you really want to be nitpicky, there are indeed a few people who currently host a static WebID Document. They do so simply because they can: it has up till now been the simplest way to host a WebID. However, people that are skilled enough to host a WebID Document themselves are easily skilled enough to set up conneg over it. I've timed it: it takes about 5 minutes using a free CloudFlare account. I can guarantee you that the majority of these people will do so the instant we add conneg to the spec. Most importantly, however, these these people are not the main target for the spec. They are not the ones for which applications will be built and an ecosystem will grow; they are not the ones to which we must cater.

I think you are under estimating the implementation burden here. And how do you keep the different serializations in sync?

I think you are agreeing with me that it's a breaking change. And you are trying to argue it's a minor change. That is a subjective analysis, that I dont agree with.

But I thank you very much for taking the time to write this all out. I do hope we can make progress.

melvincarvalho commented 6 months ago

How would this look for a compromise:

  1. If you feel the changes are minor, then the new spec should be named with version 1.1 OR 2.0 but NOT 1.0. And the CG can later come to a consensus on which

  2. Work on extension profiles can continue in the group, unblocking new work, and reducing dependencies on an all-or-nothing solution

Why dont we just agree it is a breaking change, and make that very clear, through versioning, then unblock everyone in the group that wants to make progress. Sound reasonable?

webr3 commented 6 months ago

Wading in, unhelpfully as usually.

It could be suggested that there are thousands of apps and networks and software and agent classes that would benefit from WebID - and that somewhere close to 0 of them will implement rdf/turtle/conneg (see lack of existence as proof).

JSON-LD stands a chance, only because it's json that happens to be LD compatible - we all know it'll likely be invalid json-ld but valid json if adopted widely most of the time.

So this underlies everything - keeping WebID tied to turtle and conneg literally prohibits the adoption of the concept.

The MUST turtle AND json-ld is simply an appease the beast move to facilitate moving towards an end goal of people actually having webids and the benefits on mass.

I personally have a very strong preference for no rdf, and if it must, then json-ld is viable.

I only agree to saying must both, because it'll basically force old implementations to be future compatible, and I work on the assumption that types like turtle will simply not be implemented by most future adopters. It's a different realm of the web, one that is far larger.

If wide adoption happens, and there's a V3, I'd bet my legs and arms that a review of implementations showed <0.1% support for turtle so I'd be nixed.

Just as <0.1% of every in use app service and site supports it.

woutermont commented 6 months ago

I'm not agreeing at all. What I am trying to point out is that this is de facto a minor change (which is per definition not breaking), since none of the stakeholders at which this spec is aimed would be affected.

WebID is meant to provide an entry hook into the digital lives of people. To succeed in that, it will need the support of applications to become an ecosystem. That can only be achieved by being flexible on the server side.

The spec is NOT meant to serve the whims of some developers who want a static file and don't want to put in 5min of effort. If you want that, you can create another CG, though I doubt you will find many interested parties. But please stop setting your own desires above the aims of this group, who have finally come closer to consensus and are glad about it.

woutermont commented 6 months ago

I think you are under estimating the implementation burden here. And how do you keep the different serializations in sync?

To shortly reply to this concern: as I said, I timed it, and it takes 5min. Keeping it in sync can be done with an automated script, which is about three lines of code. For example, as a GitHub workflow publishing to GitHub Pages, or using a tool like Soupault when running a static generator locally.

webr3 commented 6 months ago

I think you are under estimating the implementation burden here. And how do you keep the different serializations in sync?

To shortly reply to this concern: as I said, I timed it, and it takes 5min. Keeping it in sync can be done with an automated script, which is about three lines of code. For example, as a GitHub workflow publishing to GitHub Pages, or using a tool like Soupault when running a static generator locally.

Did you time conneging flat files, or did you implement something like a mongodb to turtle implementation with media type juggling that produces valid turtle in 5 mins?

woutermont commented 6 months ago

@webr3, what you are saying about conneg makes no sense. Did you even take the time to read the statistics I put effort in to gather?All current providers support conneg with both formats. Why? Because it is a no-brainer. So stop pulling the "conneg is a burden" card, please.

As I have clarified elsewhere, I have no ties to Turtle other than backward compatibility (which, by the way, is exactly what @melvincarvalho is making such a ruckus about). I fully agree with your other point that, in some future, it is possible and probable that the spec can drop Turtle support. But it will do so with a decent deprecation strategy, and only after the ecosystem has gradually moved to JSON-LD. Moreover, since the data format landscape has historically proven to be a volatile one, having conneg present will in the even farther future be a practical once more to move from JSON-LD to whatever the majority will fancy then. This is how we build a spec that can outlive such changes.

melvincarvalho commented 6 months ago

So stop pulling the "conneg is a burden" card, please.

It's not only me:

See danbri, inventor of FOAF, that got the whole WebID movement going:

https://twitter.com/danbri/status/1080490927284736000

"the content negotiation aspect of linked data has been massively oversold"

"The Web is very very big, and most of it shows no sign of moving to use conneg anytime soon"

I'm going to persist here, you will have to admit it's a breaking change in the case of alice.ttl or give me a reason why it is not. After that, we can proceed.

webr3 commented 6 months ago

So stop pulling the "conneg is a burden" card, please

It is. You offering proof of existing rdf conneg compatible systems doing what they do as "easy", of course it is, that's what they do.

Whilst completely ignoring the requirement to turn non-rdf non-conneg systems in to doing all that, in order to adopt WebID. That is big, and it's very preclusive to adoption.

You cannot say that tiktok or Snapchat or FB quest Auth can just rdf-ize and conneg their systems "in 5 minutes" to adopt it.

Let's keep things grounded in the reality of the web, not a niche space on it, please.

woutermont commented 6 months ago

@webr

Did you time conneging flat files, or did you implement something like a mongodb to turtle implementation with media type juggling that produces valid turtle in 5 mins?

Does it matter? The point is that those handful of people that want it can easily find ways to do it, and at least one of them is set up in a jiffy.

My test in particular was aimed at individuals hosting static files. So we're talking about pushing one format to GitHub Pages, having a workflow transpile to the other format, and setting up a Cloudflare rule that redirects to either of them depending on the Accept header.

For people setting up a database to host their (and presumably others) WebID data, adding conneg should even be less of a burden, relative to the time it already takes to set up the database in the first place.

webr3 commented 6 months ago

The point is being missed.

Conneg = super easy Conneg to valid RDF = burdensome.

woutermont commented 6 months ago

So stop pulling the "conneg is a burden" card, please

It is. You offering proof of existing rdf conneg compatible systems doing what they do as "easy", of course it is, that's what they do.

That is not my argument. What I show is that, without the spec asking or even suggesting to do so, ALL WebID providers decided to add conneg. So either all of them happen to have massive resources on their hands, or adding conneg really isn't that hard... Which one would it be?

Whilst completely ignoring the requirement to turn non-rdf non-conneg systems in to doing all that, in order to adopt WebID. That is big, and it's very preclusive to adoption.

You cannot say that tiktok or Snapchat or FB quest Auth can just rdf-ize and conneg their systems "in 5 minutes" to adopt it.

I fully miss what your argument is here. The services you mention do not implement WebID at all. If they would want to do so, they would have to make an investment into RDF anyway. Adding conneg to that investment will be like adding one pea to a bag full of peas.

woutermont commented 6 months ago

The point is being missed.

Conneg = super easy Conneg to valid RDF = burdensome.

Oh, please explain...

Edit: mind you, we are talking about the ADDED burden of conneg; anyone adopting WebID will have to produce valid RDF anyway.

webr3 commented 6 months ago

The services you mention do not implement WebID at all. If they would want to do so, they would have to make an investment into RDF anyway

The majority of the web already does json-ld with no investment in the rdf stack, that's the point. To make it accessible to the the web, not the 0.1% of it that's rdfish.

Hence the name WebID.

If the goal here is for SolidID or RDFID, do that elsewhere.

Oh, please explain... Edit: mind you, we are talking about the ADDED burden of conneg; anyone adopting WebID will have to produce valid RDF anyway.

People with c+p json-ld examples, or a mild wrapping template over json objects, do not have rdf stacks. That's why it's prevalent. This is clear for years. There's a huge difference between doing this and doing an rdf stack for implementations.

melvincarvalho commented 6 months ago

at least one of them is set up in a jiffy

What is the URL?

woutermont commented 6 months ago

Using JSON-LD without RDF stack is a thing, but unless you have numbers I don't believe for a second it is widespread. It's not like non-RDF services just happen to have JSON-LD WebID data lying around, and we're forcing them to actually treat it as RDF.

Moreover, even for players that want to enter the WebID ecosystem and up till then work with "non-RDF JSON-LD", adding the few lines of code to transpile valid JSON-LD to valid Turtle can hardly be called a burden.

You seem to have a grudge against Solid for some reason, but for all the years that WebID existed, they are the ones that actually put it on the map. As far as I know all WebID providers except OpenLink are primarily Solid businesses or organisations.

webr3 commented 6 months ago

Using JSON-LD without RDF stack is a thing, but unless you have numbers I don't believe for a second it is widespread

I can account for 1.6bn json-ld documents over ~750 domains alone. I guess every CMS with schema.org producing plugins would account for exponentially more than this.

You seem to have a grudge against Solid for some reason

Not at all, I'd just like webid to be usable by the rest of the web, not a niche segment of it.

This can easily be framed as rdf stacks being rest of web compatible, and then forcing the rest of the web to be rdf compatible to use a web scale identity solution. That doesn't sit well with me. Work with what exists.

woutermont commented 6 months ago

Wait, what kind of documents/domains are we talking about? Numbers mean nothing by just throwing them out without reference. These surely cannot be WebIDs or WebID-aspiring services?

I also don't see where you want this to go? WebID has been an RDF standard from the start. It currently is produced as Turtle, and I agree that consuming them as JSON-LD would be a benefit for lots of applications. That is precisely why we are proposing conneg. You surely are not saying that applications preferring JSON-LD cannot add an accept header to their requests?

woutermont commented 6 months ago

To go into the only concrete example you gave: CMS's that can produce schema.org data (whatever that is).

Schema.org is (or at least pretends to be) an RDF vocabulary, so services producing such output produce RDF, willing or not (just like all JSON-LD is RDF). As I explained above, there is nothing preventing these systems to use one of the widely available RDF transpilers to output any other RDF format on request.

(As an aside, I would actively dissuade anyone from using a CMS to host their WebID Document. They are absolutely not secure enough for such a central target in online authentication.)

woutermont commented 6 months ago

Anyway, I'm going to wait for others to chip in. I find it deplorable, however, that you would put a hard-reached consensus, of which you yourself said it was a way forward, on the line again.

webr3 commented 6 months ago

Wait, what kind of documents/domains are we talking about? Numbers mean nothing by just throwing them out without reference. These surely cannot be WebIDs or WebID-aspiring services?

The primary of those domains has 4-5k active users at this moment, higher during weekdays - I want to add webid auth to it, but adding an rdf stack is non compatible with scalable deployments (since 3-4x more bot requests come in than end users - too much weight). We currently run a stateless setup, but have three outstanding items which require auth at scale. We'd both publish and consume webids. Changing backend setups to handle abstract rdf and still serving 1k+ requests a second with low latency just isn't possible or viable or worth the cost and loss by reduce metrics.

You surely are not saying that applications preferring JSON-LD cannot add an accept header to their requests?

Quite the opposite, that server side respecting a json-ld accept is easy, but respecting a turtle accept is hard.

Conneg is easy, understanding abstract rdf, storing it in a different format in a db or quad or graph store, then outputting it in a valid concrete format that's equivalent to the others, that's the hard bit - not to mention the maintenance burden of keeping it all up to date.

I, and the company I work for, cannot implement a turtle requirement.

I personally agree to must and, over both types, with full openness that we'd never implement the turtle bit.

webr3 commented 6 months ago

I find it deplorable, however, that you would put a hard-reached consensus, of which you yourself said it was a way forward, on the line again

I've stated repeatedly I'm still.open to MUST AND and find it acceptable - just giving full transparency as to the reasons why, and real world issues faced by having the turtle must. I argue very strongly that WebIDs limited adoption to date, is precisely because it's got a complex rdf requirement, as opposed to a simple view as rdf json-ld requirement.

There's no argument here from me, any non preclusive way forward that allows others to implement is fine with me.

woutermont commented 6 months ago

@webr3, can I ask what the service is your client is providing? More particularly, I am curious as to why they would want to host WebIDs themselves.

As to the scalability of content negotiated RDF, multiple Solid stakeholders seem to disagree with you (given that they are doing it, I assume they believe it is possible). Without clear insight into your setup, or a benchmark to prove otherwise, I therefore still find it hard to believe.

webr3 commented 6 months ago

@webr3, can I ask what the service is your client is providing? More particularly, I am curious as to why they would want to host WebIDs themselves.

Of course, happy to see conversation turning more amicable. To clarify, it's not my client, I am the technical lead and control all aspects of the properties. We work closely with US government agencies both publicly and privately, one of the primary areas we are looking to scale out is basically public and private contact management for various departments and agencies, at a very large scale. As such we would both be the authentication host and provider. Other use cases are more public oriented, and saas, where we'd be hosting what are perhaps best termed webid persona's, public profiles w/ auth for users, both for public publishing, and auth to utilize services. Finally, there's an ongoing web wide problems of delegated authentication, requests made by bots on behalf of other agents, which we're looking to manage - we work with multiple large advertizing agencies and would facilitate them to implement a webid-like agentid that allows the verification of bot requests with keys, ident, and also cidr ranges. This latter part is probably the most likely to scale on mass (as they'd document and roll to other consumers - it could indeed fast become a defacto approach net wide.).

As to the scalability of content negotiated RDF, multiple Solid stakeholders seem to disagree with you (given that they are doing it, I assume they believe it is possible). Without clear insight into your setup, or a benchmark to prove otherwise, I therefore still find it hard to believe.

Sure! We have multiple AWS global accelerators, each back on to 4 or more load balancers, which then hit 2x as many reverse haproxy setups, h2 from them over to clusters of servers in near by data centers, each of those machines run haproxy again in front of local web server and required software, behind that we run mongodb clusters, with local replicas on the same physical bare metal as the web servers. Connections are kept open between everything, leading to very low latency, our average request/response time is 23ms currently to end users - with no cacheing.

If any of the stakeholders have setups which can handle ~25k db level queries per second at these kind of speeds, ~ 80% read, 20% write - anything sub 10ms for average db/graph level ops would be doable, I'd love to hear about their setups. We have multiple spare servers and can act fast to try things out. Obviously there's no learning curve for the RDF side.

jacoscaz commented 6 months ago

/chair hat on

If you feel the changes are minor, then the new spec should be named with version 1.1 OR 2.0 but NOT 1.0.

@melvincarvalho to clarify, would changing the title of the working draft https://w3c.github.io/WebID/spec/identity/index.html to Web Identity and Discovery 1.1 cause you to withdraw your strong objection to conneg?

kidehen commented 6 months ago

WebID is meant to provide an entry hook into the digital lives of people. To succeed in that, it will need the support of applications to become an ecosystem. That can only be achieved by being flexible on the server side.

Yes, and this can be achieved while also relegating content-negotiation to the implementation details level i.e., putting it in the implementation example (or techniques) section as one approach.

I agree with where you are headed, but content-negotiation doesn't need to be a part of the spec, since its just a technique afforded by HTTP re server implementation.

melvincarvalho commented 6 months ago

/chair hat on

If you feel the changes are minor, then the new spec should be named with version 1.1 OR 2.0 but NOT 1.0.

@melvincarvalho to clarify, would changing the title of the working draft https://w3c.github.io/WebID/spec/identity/index.html to Web Identity and Discovery 1.1 cause you to withdraw your strong objection to conneg?

I appreciate the ongoing discussions and the commitment of everyone involved. I'd like to clarify my perspective regarding the recent propositions:

  1. Proposal Context: Initially, the suggestion of simultaneously pursuing v2.0 or v1.1 alongside extension profiles seemed viable. However, my understanding evolved following the insights about the potential deprecation of Turtle.

  2. Acknowledging Breaking Changes: It's crucial for the group, including Wouter, to recognize that mandating content negotiation (conneg) constitutes a significant modification. This point, I believe, should have been the preliminary observation. I've illustrated this in our discussions, yet I feel the responses have somewhat downplayed its impact.

  3. Implications on Versioning: Acknowledging the substantial nature of these changes naturally leads to the question of versioning. Given the scale, it seems appropriate to consider this a move towards v2.0. The discourse isn't merely about introducing JSON-LD but about a gradual phase-out of Turtle, marking a substantial shift.

  4. Consensus and Versioning Integrity: I'm concerned that the principle of lazy consensus might be misapplied to introduce significant alterations without the due process of version incrementation. For a spec that has reached a level of stability and fostered a wide ecosystem, any such major change warrants a more deliberate and inclusive consensus-building process.

  5. Process and Controversy: The collective effort over the past six months was grounded in achieving a shared understanding of WebID, independent of serialization, which is foundational. The notion of extension profiles was an extension of this consensus, aiming to cater to diverse requirements. The shift in focus towards altering a stable spec, however, diverges from this path and delves into areas of long-standing debate. It's crucial to tread these waters with consensus and respect for the broader implications.

  6. Path Forward: If we are to consider any significant changes or deprecation of Turtle, it necessitates broad consensus, including dialogues within the Solid WG. Such changes should be transparently proposed, with explicit versioning considerations. A living standard cannot be retrofitted with fundamental changes without a clear, collective agreement.

In summary, I urge us to:

@jacoscaz, your leadership is invaluable, and this isn't a reflection on your chairmanship. Steering the direction of our work requires sensitivity to the foundational elements of the spec, and significant changes demand a proportionate consensus-building effort. Changing things that have been in use for 10 years is going to be 10x more difficult than making progress on a new fresh, extensible, version of WebID for 2024, which gives everyone what they want.

namedgraph commented 6 months ago

I think you are under estimating the implementation burden here. And how do you keep the different serializations in sync?

@melvincarvalho no serious Linked Data application would store RDF as documents, with a copy per media type. Which would lead to issues like them going out of sync.

RDF is being stored as triples in a triplestore, an abstract graph model that has no inherent syntax. Only upon HTTP response the right media type is chosen using conneg and used to serialize the response body. Conneg is automatically handled by the HTTP framework such as Jersey. Switching serializations is as simply as passing different media types to Jena's I/O. From the client perspective there are still documents, but they are generated on the fly (and can be cached of course).

namedgraph commented 6 months ago

Now lets try to illustrate this issue with some examples:

WebID 1.0 request/response

GET alice.ttl
Accept: application/ld+json, text/turtle, */*;q=0.8

200 OK
Content-Type: text/turtle

WebID 2.0 request/response

GET alice.ttl
Accept: application/ld+json, text/turtle, */*;q=0.8

200 OK
Content-Type: text/turtle

It looks to me that the same headers would work with both WebID 1.0 and 2.0?

melvincarvalho commented 6 months ago

respectfully, this quote alone proves that you don't understand how RDF-aware applications are built.

@namedgraph this is disprespectful. I have coded many RDF-aware applications. It's an ad hominem attack and unprofessional. Please retract.

namedgraph commented 6 months ago

OK, comment updated

melvincarvalho commented 6 months ago

no serious Linked Data application would store RDF as documents

Sure they would. Linked Data is just another name for the Semantic Web. It's about having machine and human readable documents on the web, including with hyperlinks.

The style you may be your personal preference, but you need not project that preference onto others.

Let me ask you, how many daily active users do your Linked Data applications have (or monthly if daily=0). What do they use them for? Why do you consider your view to be a universal one, who are you representing?

Id suggest making your tone more subjective, since there are obviously a range or publishing techniques in RDF.

melvincarvalho commented 6 months ago

Now lets try to illustrate this issue with some examples:

WebID 1.0 request/response

GET alice.ttl
Accept: application/ld+json, text/turtle, */*;q=0.8

200 OK
Content-Type: text/turtle

WebID 2.0 request/response

GET alice.ttl
Accept: application/ld+json, text/turtle, */*;q=0.8

200 OK
Content-Type: text/turtle

It looks to me that the same headers would work with both WebID 1.0 and 2.0?

If the application is expecting JSON-LD then the turtle only response will fail. That's why JSON-LD=MUST is a breaking change.

woutermont commented 6 months ago

If the application is expecting JSON-LD then the turtle only response will fail. That's why JSON-LD=MUST is a breaking change.

Exactly. This is why we need conneg.

melvincarvalho commented 6 months ago

If the application is expecting JSON-LD then the turtle only response will fail. That's why JSON-LD=MUST is a breaking change.

Exactly. This is why we need conneg.

I believe conneg is implied here. But this issue is simply proving, in a pseudo-mathematical, sense that mandating conneg (implicitly or explicit) via dual MUSTs for Turlte AND JSON-LD is a breaking change. I think it's something we could agree on.

namedgraph commented 6 months ago

It looks to me that the same headers would work with both WebID 1.0 and 2.0?

If the application is expecting JSON-LD then the turtle only response will fail. That's why JSON-LD=MUST is a breaking change.

Isn't WebID 2.0 mandating both JSON-LD and Turtle? In which case the client should be able to handle both.

namedgraph commented 6 months ago

no serious Linked Data application would store RDF as documents

Sure they would. Linked Data is just another name for the Semantic Web. It's about having machine and human readable documents on the web, including with hyperlinks.

The style you may be your personal preference, but you need not project that preference onto others.

Melvin, this is not even RDF or Semantic Web specific. There was a time 20-30 years ago where websites consisted of static HTML documents. Nowadays the absolute majority of webpages and API responses are generated from databases on the fly, as I'm sure you would agree. The same applies to Linked Data, where the database is the triplestore. RDF has an additional advantage of offering a number of syntaxes for the same graph content, and conneg allows clients to choose them.

Let me ask you, how many daily active users do your Linked Data applications have (or monthly if daily=0). What do they use them for? Why do you consider your view to be a universal one, who are you representing?

It's hard to tell what the users are using our software for, but it does have 450+ GH stars at the moment: https://github.com/AtomGraph/LinkedDataHub It has supported conneg for 10+ years since the project that was an early prototype of LinkedDataHub.

Id suggest making your tone more subjective, since there are obviously a range or publishing techniques in RDF.

melvincarvalho commented 6 months ago

@namedgraph thanks for the additional context. That is a reasonable deployment. Yet I dont think you speak for everyone. It's absolutely legitimate to have documents on the web, that's how the web started, and how millions of people use the web.

Do you agree that the addition of conneg=MUST is a breaking change on the server side?

Clients may only support 1 of 2 serializations. And, as wouter says, turtle itself may be deprecated.

melvincarvalho commented 6 months ago

It looks to me that the same headers would work with both WebID 1.0 and 2.0?

If the application is expecting JSON-LD then the turtle only response will fail. That's why JSON-LD=MUST is a breaking change.

Isn't WebID 2.0 mandating both JSON-LD and Turtle? In which case the client should be able to handle both.

Client only needs to be able to handle 1 if the server handles both. There's the point of breakage.

namedgraph commented 6 months ago

Melvin I think I know what you're hinting at but I still don't agree.

You are suggesting a situation where a client only supports JSON-LD, correct? Well first of all it shouldn't request Turtle if it cannot handle it. Lets model that:

GET alice.ttl
Accept: application/ld+json

406 Not Acceptable

Like this? Yes JSON-LD-only client would fail to load the Turtle data in this case. But that would also be the case with WebID 1.0, so I don't understand what you think changes here?

melvincarvalho commented 6 months ago

Melvin I think I know what you're hinting at but I still don't agree.

You are suggesting a situation where a client only supports JSON-LD, correct? Well first of all it shouldn't request Turtle if it cannot handle it. Lets model that:

GET alice.ttl
Accept: application/ld+json

406 Not Acceptable

Like this? Yes JSON-LD-only client would fail to load the Turtle data in this case. But that would also be the case with WebID 1.0, so I don't understand what you think changes here?

Yes, exactly.

In WebID 1.0 everything works because you have a common ground, namely, turtle. This was by design.

If there is a change to BOTH turtle and JSON-LD being MUST (e.g. via conneg) then the client can be either turlte or json.

That means that webid 1.0 will not work with the new webid 2.0. Hence its a breaking change.

namedgraph commented 6 months ago

I would argue (as I've done before) that the issue here is that 1.0 specified any serialization at all, in this case Turtle. Because we're talking about an orthogonal HTTP/conneg issue that is not specific to WebID.

TL;DR: clients and servers might negotiate and might not find an acceptable serialization, and that is simply a common HTTP error and not a WebID compliance issue

In other words, conneg is the solution here, not the problem. The server has a set of media types it can read/write, as does the client. Conneg allows them to find an intersection between those sets, or if there is none, leads to an error. And that works perfectly transparently -- until someone puts some text in a specification like WebID that some specific serialization MUST be supported, which breaks the orthogonality.

melvincarvalho commented 6 months ago

/chair hat on

If you feel the changes are minor, then the new spec should be named with version 1.1 OR 2.0 but NOT 1.0.

@melvincarvalho to clarify, would changing the title of the working draft https://w3c.github.io/WebID/spec/identity/index.html to Web Identity and Discovery 1.1 cause you to withdraw your strong objection to conneg?

Concretely, there are multiple aspects

  1. Advocacy for Status Quo: as expressed in the call for a complete freeze. This stance acknowledges the longstanding impasse within the group. However, I also appreciate that the formation of a Solid WG could introduce pivotal change, potentially dissolving the freeze.

  2. Versioning for New Initiatives: I suggest that any new endeavors stemming from the WebID ED 2014 adopt a fresh versioning approach. This clear demarcation would properly represent the involvement of new authors, editors, and methodologies. For substantial revisions, version 2.0 seems fitting, whereas 1.x might suit more incremental updates. It's worth noting that introducing mandatory content negotiation (conneg) is a significant alteration and should be treated as such in versioning. This issue is about proving that the change clearly is breaking.

  3. Personal Stance on Conneg: On a personal note, the push towards mandatory conneg is a significant shift that I find challenging to align with, and could not live with, if it was the only choice. That said, exploring the development of a JSON-LD extension profile. Detailed guidance on its implementation, naming conventions, and structure would offer a constructive pathway. This approach would also allow those inclined towards a conneg path to proceed independently, ensuring diverse yet harmonious development within the community.

I hope that is clear enough. It can be revisited this when the Solid WG starts up. Mandating conneg is by far the most controversial change that has been suggested to the group in many years. Replacing RDFa with JSON-LD and structured data islands would have been far easier. If you want to make a new draft, with new authors, and a new set of defaults, bump the version to 2.0, and see if it stands on its own merits.

melvincarvalho commented 6 months ago

I would argue (as I've done before) that the issue here is that 1.0 specified any serialization at all, in this case Turtle. Because we're talking about an orthogonal HTTP/conneg issue that is not specific to WebID.

So, Turtle was specifically chosen as a MUST in WebID 1.0 because it guarantees interop.

The vast majority of the web does not use conneg. And WebID is designed as a "Web" technology.

You need a common ground so that everything works together.

Turtle was chosen, maybe that was a mistake in retrospect, but it was the decision taken. A change to that would be breaking.

namedgraph commented 6 months ago

On the flip side, WebID document resolution depends on HTTP (as does the rest of the web) and HTTP supports conneg by default. So yes, it was a mistake to attempt to override the default HTTP behaviour because it was clearly out of scope for the WebID spec. That has now put us in a bind trying to satisfy conflicting requirements: Turtle is MUST due to backwards compatibility with 1.0, and JSON-LD is MUST due to new requirements. I don't see how these can be reconciled in the light of your example.

namedgraph commented 6 months ago

Turtle was chosen, maybe that was a mistake in retrospect, but it was the decision taken. A change to that would be breaking.

If we would relax the serialization requirements by removing Turtle, i.e. removing them altogether, how would that break the backwards compatibility?

melvincarvalho commented 6 months ago

Turtle was chosen, maybe that was a mistake in retrospect, but it was the decision taken. A change to that would be breaking.

If we would relax the serialization requirements by removing Turtle, i.e. removing them altogether, how would that break the backwards compatibility?

Because there's no common ground for libraries to ensure everything works for all users. That's why ONE serialization was selected. Turtle just seemed a good choice at the time.

namedgraph commented 6 months ago

That was then, but I'm talking about making a change now.

No other spec in the RDF stack such as SPARQL or SHACL specifies any mandatory RDF serializations, because it is understood it's a completely orthogonal concern.

melvincarvalho commented 6 months ago

That was then, but I'm talking about making a change now.

No other spec in the RDF stack such as SPARQL or SHACL specifies any mandatory RDF serializations, because it is understood it's a completely orthogonal concern.

I believe solid does.

I've sent a message to the Solid CG to see if anyone has strong views on Turtle vs JSON-LD:

https://lists.w3.org/Archives/Public/public-solid/2024Feb/0001.html

Let's see what they say.