Specify/advise semantics of default Container resources (index.html/.ttl)

solid / specification

Solid Technical Reports

https://solidproject.org/TR/

MIT License

477 stars 42 forks source link

Specify/advise semantics of default Container resources (index.html/.ttl) #69

Open dmitrizagidulin opened 4 years ago

dmitrizagidulin commented 4 years ago

A topic that causes a lot of implementor and app-developer confusion is the handling of default resources for Containers - index.html and/or index.ttl.

Questions:

Is the index.html behavior a MUST or MAY?
Is the index.ttl behavior a MUST or MAY?
How to prevent the fairly common occurrence of "I accidentally created an index.html but forgot to add an .acl, plus that kicks me out of the Data browser, so anyways what do I do now."

In either case, we should either add it to spec or provide non-normative text advising devs what to do.

Related issues:

csarven commented 4 years ago

In that context I meant "replace the representation" as to the intention over the wire. I've already said "affect a resource" prior to that.

re affects/effects, I was trying to make the point that "pure RDF" is irrelevant because the "target" is intended to be the resource, not the representation. Manipulating the graph.

I'm not aware of a definition where HTML+RDFa is an LDP-NR. A representation can be "fully" stated in "pure RDF" in HTML+RDFa, therefore it can be an LDP-RS.

kjetilk commented 4 years ago

A representation can be "fully" stated in "pure RDF" in HTML+RDFa, therefore it can be an LDP-RS.

But @TallTed is referring to the situations where it is not fully represented in RDF. If it is fully represented in RDF, then it is easy, but that's not the interesting situation.

csarven commented 4 years ago

But if you look at how "pure RDF" is marketed at the moment, a generalisation is being made in that any representation with RDFa is an LDP-NR (or LDPR).

If you want to hold off, wait until the representation is inspected. Just as we should inspect claimed Turtle to make sure that it is actually Turtle and not gobbledygook PDF soup.

TallTed commented 4 years ago

re affects/effects, I was trying to make the point that "pure RDF" is irrelevant because the "target" is intended to be the resource, not the representation. Manipulating the graph.

I have explicitly raised scenarios where the TARGET RESOURCE is NOT PURE RDF, is NOT just "a graph". You are declaring that my scenarios -- which are based on real-world experience -- do not exist. Please stop that.

I'm not aware of a definition where HTML+RDFa is an LDP-NR. A representation can be "fully" stated in "pure RDF" in HTML+RDFa, therefore it can be an LDP-RS.

Definitions were unfortunately written unclearly in the LDP spec. I am sorry about that. The error comes down to a missing "fully" in the LDP-NR definition. I believe the intent is as I have stated multiple times, and my belief is based on having been a participant in the rooms (and concalls) which resulted in the LDP spec.

I don't think I have ever said that it is impossible to have an HTML+RDFa document which fully represents its state within the RDF. I have said (or strongly implied) that this is unlikely, in practice is uncommon in the extreme, and is not so for the hypothetical RDFa documents to which I've been referring in the conversation above.

It would be fine to treat an RDFa document which is fully represented internally as RDF as an LDP-RS -- but because this is not the case for all (nor, I say again, for most) RDFa documents, this should not be taken as the general nor the determinant case.

The rule here must be either "treat RDFa as LDP-NR because they cannot be guaranteed to be LDP-RS" or "test all RDFa files for RDF representation; where 100%, treat as LDP-RS; where <100%, treat as LDP-NR". As the latter rule provides no assurance of anything, I submit that it is problematic, and the former rule should be ratified.

Please note that nothing says that there cannot be RDF content within an LDP-NR! In other words, LDP-NR (including RDFa, PDFs, TXT, XLST, etc.) can be RDF bearing -- and RDFa (a/k/a HTML+RDFa) documents definitely do bear RDF, as that's the whole difference between then and HTML docs.

csarven commented 4 years ago

Ted, we're trying to acknowledge some of the common real-world publishing scenarios as well as implementations in the wild in order to see how far the existing specs gets us and where to go next.

I'll repeat the running scenario so you can tell us the right way of doing it with respect to LDP:

People will want to publish a homepage in HTML. That's typically at site root: /, and that's generally accepted (or at least intended) to be a container from both LDP and Solid's perspective. We can look at many variations on this eg. whether / can be any kind of resource (re interaction model), or must only be one, or differ depending on how it is observed, etc. but for now let's stick to the mainline.

An actual example (but feel free to throw this out and use a different one): my website is at http://csarven.ca/ and WebID is http://csarven.ca/#i . How can I update that (WebID Profile) document so that it is available in HTML+RDFa, Turtle, and JSON-LD representations?

From your responses so far, I haven't understood your solution or how exactly can LDP help to accomplish what we are after without conflicting with LDP; introducing boatloads of conditions; or awkward cases to deal with.

It is not a problem if there is no simple solution here while conforming to LDP, but we'd just like to understand that. I don't want us to propose a convoluted solution just so that the Solid spec somehow is compatible with LDP. As previously agreed, the Solid spec can take on whatever is of use but not more.

acoburn commented 4 years ago

An actual example (but feel free to throw this out and use a different one): my website is at http://csarven.ca/ and WebID is http://csarven.ca/#i . How can I update that (WebID Profile) document so that it is available in HTML+RDFa, Turtle, and JSON-LD representations?

The way this has been handled by Trellis is that, for a given LDP-RS (whether container, root resource or whatever), that resource can be content-negotiated in the following ways:

Read requests: by default (with no Accept header), Turtle is produced. Other types of RDF-based conneg are supported (JSON-LD, as required by LDP, and also N-Triples), but an HTML representation is also made available when a client includes, for example, Accept: text/html. That HTML version is, in fact, RDFa with all of the same triples one would find in the other RDF-based serializations. The application generates the HTML based on a configurable template, so a server operator can modify the look and feel of the RDFa.

Write requests: these come in two flavors: sparql-update for PATCH interactions, and (relevant for this discussion) RDF serializations for PUT and POST. However, (notably) PUT and PATCH only accept text/turtle, application/n-triples and application/ld+json. Write requests where the incoming entity body is text/html are not accepted as an LDP-RS (they would be for an LDP-NR)

csarven commented 4 years ago

Thanks for sharing @acoburn .

If I understand correctly, Trellis will only let an application create or update a homepage such that when requested as text/html, it will only provide a HTML+RDFa serialization as "pure RDF" (or "fully" as per LDP-RS). More concretely, it is not possible to create, update or generate a document like my homepage (which contains some content that is not intended to be part of the RDF graph).

acoburn commented 4 years ago

@csarven given the current implementation of Trellis' RDFa writer, one could create and update a homepage such as yours, but the HTML template in use would be applied to all LDP-RS resources. So, for example, you could update the data on that homepage via SPARQL-Update (or PUT with a Turtle payload), but in order to change the color scheme or the HTML layout, you'd have to modify the template which is not something you can do through the HTTP interface.

kjetilk commented 4 years ago

I'm very relieved to hear that, @acoburn, since it sounds like Trellis is already quite close to the emergent (I wouldn't say rough yet :-) ) consensus, because at a fundamental level, the changes should be relatively minor to support this, right?

I mean, in the sense that you could have HTML documents that incorporates the template and therefore the RDFa with relative ease, right?

csarven commented 4 years ago

What Trellis is doing is perfectly fine. Its RDFa writer serves as a baseline for representations equivalent with respect to the RDF graph. In the Solid ecosystem, we are expecting applications to create/update specialised documents, whereas the server will only do the minimal, so the single template to help generate the HTML+RDFa serialization is fine and in what shape or form is an implementation detail (as long as the triples are there).

Having said that, what LDP affords and Trellis implements is I think limited in that it doesn't clearly handle a category of use cases related to creating and updating human- and machine- readable resources eg. a homepage, index, or any article for that matter, driven by client applications.

We need to bridge applications like dokieli and Trellis so that they can work with desired states of resources.

@acoburn :

if POST/PUT text/html is accepted as LDP-NR, what then happens on GET text/turtle?
If / is LDP-BC (LDP-RS), is its representation in text/html limited to the RDFa writer (excluding any client involvement)?

Do we expect multiple representations of a resource to be equivalent (in some reasonably defined or predictable way eg. isomorphic RDF graphs imply information equivalence)? If not, can representations be kept tracked and updated independently - is this too complex?

elf-pavlik commented 4 years ago

I don't think I have ever said that it is impossible to have an HTML+RDFa document which fully represents its state within the RDF. I have said (or strongly implied) that this is unlikely, in practice is uncommon in the extreme, and is not so for the hypothetical RDFa documents to which I've been referring in the conversation above.

It would be fine to treat an RDFa document which is fully represented internally as RDF as an LDP-RS -- but because this is not the case for all (nor, I say again, for most) RDFa documents, this should not be taken as the general nor the determinant case.

I fully agree with above, I can't really imagine scenario where anyone would go with RDFa intending to only serialize RDF graph. I would strongly prefer to treat HTML as Non-RDF Source and wherever we consider using RDFa just enable use of plain HTML.

People will want to publish a homepage in HTML. That's typically at site root: /, and that's generally accepted (or at least intended) to be a container from both LDP and Solid's perspective. We can look at many variations on this eg. whether / can be any kind of resource (re interaction model), or must only be one, or differ depending on how it is observed, etc. but for now let's stick to the mainline.

While I agree with recognizing it as a common use case. I think we should take this use case and consider various approaches. For example / when requested with text/html could redirect to something user (or application they use) can configure eg. /home. Most frontend framework if used can easily change browser location bar to / using History API.

An actual example (but feel free to throw this out and use a different one): my website is at http://csarven.ca/ and WebID is http://csarven.ca/#i . How can I update that (WebID Profile) document so that it is available in HTML+RDFa, Turtle, and JSON-LD representations?

I think approach above would work in this case, if you don't want to fiddle with History API you would just endup with extra slug of your choice. eg. /home or /hello. Myself I will most likely use History API to set it to bare /. WebID draft makes text/trutle a MUST and client's have not reason to expect any other representation. I don't see reason for making publishing WebID Profile as HTML+RDFa a requirement for Solid.

Frankly, if we discuss website use cases, we should gather few common use cases. One common patter used by many SPA/PWA would have common shell / layout returned from any /**/* and all the navigation would happen client side. Not sure if here one would even go with Solid Storage for deploying it or use something better tailored for it (including pre rendering etc.).

My main point goes back to the top of this comment:

HTML should be treated as not intended to represent RDF and wherever we want to enable use of HTML we must not assume use of HTML+RDFa and support plain HTML.

csarven commented 4 years ago

I can't really imagine scenario where anyone would go with RDFa intending to only serialize RDF graph.

You can imagine a "scenario" for format x but not y?

Trellis has a RDFa writer outputting something useful for human consumers, like a directory index that's also machine-readable.

The utility of RDFa in the Solid ecosystem doesn't depend on LDP classification.

We expect the applications to understand the purpose of the RDF graphs and provide a useful environment.

For example / when requested with text/html could redirect to something user (or application they use) can configure eg. /home. Most frontend framework if used can easily change browser location bar to / using History API.

Is that a common publishing behaviour?

Why shouldn't people have their homepage at /?

I don't see reason for making publishing WebID Profile as HTML+RDFa a requirement for Solid.

No one proposed that should be a requirement.

We want to specify right enough things for the ecosystem so that people can do different stuff with it. We shouldn't prohibit such possibilities and choices.

HTML should be treated as not intended to represent RDF and wherever we want to enable use of HTML we must not assume use of HTML+RDFa and support plain HTML.

Why?

We want to enable the creation of applications to advance human/machine-readable resources.

elf-pavlik commented 4 years ago

For example / when requested with text/html could redirect to something user (or application they use) can configure eg. /home. Most frontend framework if used can easily change browser location bar to / using History API.

Is that a common publishing behaviour? Why shouldn't people have their homepage at /?

HTML should be treated as not intended to represent RDF and wherever we want to enable use of HTML we must not assume use of HTML+RDFa and support plain HTML.

Why?

IMO if we want to enable people to have their homepage at / (without using History API) I think we should also allow them to use plain HTML and don't put HTML+RDFa requirement. Can we do that?

In https://github.com/solid/specification/issues/107#issuecomment-567981204 I notice that one may want to publish plain HTML at any URI ending with trailing /.

csarven commented 4 years ago

You might want to respond to this:

Do we expect multiple representations of a resource to be equivalent (in some reasonably defined or predictable way eg. isomorphic RDF graphs imply information equivalence)? If not, can representations be kept tracked and updated independently - is this too complex?

The History API has no bearing on this issue; HTTP resource access. It is about server and client negotiation as to what happens at / on the HTTP layer. How to keep the interactions uniform while handling multiple representations. This is not about permitting format x and prohibiting y based on opinions or arbitrarily pigeonholing. Assumptions need to be in check. I would argue that publishing and implementation experience ought to trump personal preferences. Can we do that?

elf-pavlik commented 4 years ago

Do we expect multiple representations of a resource to be equivalent (in some reasonably defined or predictable way eg. isomorphic RDF graphs imply information equivalence)? If not, can representations be kept tracked and updated independently - is this too complex?

I agree with what @TallTed said:

HTTP spec says PUT replaces a resource in its entirety. In other words, PUT text/turtle may be understood as DELETE the old {RDFa, HTML, XLS, TXT, etc.} resource, and INSERT this Turtle resource in its place.

I don't think we should handle different representations denoted by the same IRI independently and only use different representations for RDF Sources, in that case server would not be expected to preserve any serialization specific artifacts.

Trellis has a RDFa writer outputting something useful for human consumers, like a directory index that's also machine-readable.

Can implementation X conform to the spec if when text/html requested it responds with plain HTML (no RDFa)? If yes I think it should use Content-Location header and if we want to give clients control over it we most likely will need to define dedicated client-server interface to manage it.

elf-pavlik commented 4 years ago

https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.6.2.5.p.9

For example, if a client makes a PUT request on a negotiated resource and the origin server accepts that PUT (without redirection), then the new state of that resource is expected to be consistent with the one representation supplied in that PUT; the Content-Location cannot be used as a form of reverse content selection identifier to update only one of the negotiated representations. If the user agent had wanted the latter semantics, it would have applied the PUT directly to the Content-Location URI.

kjetilk commented 4 years ago

Good point, @elf-pavlik . I'm generally in line with that draft, and I think the easy way out of that conondrum is to prohibit PUT on container (which could be extended to any resource that has server-managed partial representation). However, that discussion is better suited in #40 , I think. And as you can see, I'm open to several takes on this.

csarven commented 4 years ago

re PUT quote - which is by the way carried from 7231 - the Content-Location discussion (109) wasn't about client's request for reverse content selection. It was about whether to have any expectation in a response.

I don't think we should handle different representations denoted by the same IRI independently and only use different representations for RDF Sources,

I don't understand why the state of a resource is constrained to having only representations as RDF Sources.

in that case server would not be expected to preserve any serialization specific artifacts.

That's just encoded RDF graph in RDF Sources and that it can be serialized in different ways without having any expectation to persist information beyond the graph. Which is why it is okay to focus on the RDF graph in HTML+RDFa.

Can implementation X conform to the spec if when text/html requested it responds with plain HTML (no RDFa)? If yes I think it should use Content-Location header and if we want to give clients control over it we most likely will need to define dedicated client-server interface to manage it.

By "it should use", I assume you mean the server. I don't understand why HTML without RDFa needs to have its own URL. Updates can go through the resource URI just as it did when it was created - same effective request URIs. Server may use Content-Location for any Content-Type.

kjetilk commented 4 years ago

It seems like we haven't completely narrowed in the problem space yet, there are too many options open. I'd like to narrow in on a few.

Seemingly current state: index.html is a resource in its own right, returned as a representation of / when Accept header says so, otherwise manipulated like any other resource. It is not clearly defined which ACL applies to it, but it seems to mostly have its own, creating an unclear situation when it is returned as the representation of /. Possible improvement is to clearly define which ACL applies. This is what we originally encountered in https://github.com/solid/web-access-control-spec/issues/36
Work to resolve #109 in a way that allows index.html to exist and defined to be the representation of /, when the Accept header says so.
index.html is not a resource in its own right, but an internal implementation detail that is never exposed to the client. Within this, there are several possibilities:
1. The HTML representation is manipulated completely separate from the RDF representation based on the Content-Type and Accept headers. No requirements to represent the container RDF.
2. The HTML representation MUST bring along the representation of the container RDF. There are several possibilities within this:
  1. PUT isn't allowed to update the container, POST is used to replace the HTML representation and to append the RDF representation.
  2. PUT is allowed to replace the HTML representation, but not the RDF representation.
  3. PUT is allowed for both representations, but server-managed triples are ignored.
  4. Triples in the RDFa are ignored when updating a container.
  5. Triples in the RDFa are merged into the RDF representation of a container.
  6. Triples in the RDFa stays a part of the HTML representation, will not influence the RDF representation of the container.

The main reason I'm reluctant to address #109 now (it needs to be addressed, but not necessarily now), is the rather messy situation I've seen with the application of ACLs to index.html. I'm afraid it difficult to get right, thus resulting in bad security usability. That one resource is a representation of another is a common pattern on the Web, but not in a UNIX filesystem, which also creates some tension. That's why I am most inclined to not have index.html as a resource in its own right, making #109 an orthogonal issue that does not need to be addressed before later.

I am also most inclined to make sure the container representation contains the RDF, some of which is required to be represented. In #108 and #40 , I have advocated for that PUT isn't allowed to update a container, but I'm open to other interpretations.

elf-pavlik commented 4 years ago

I don't think we should handle different representations denoted by the same IRI independently and only use different representations for RDF Sources,

I don't understand why the state of a resource is constrained to having only representations as RDF Sources.

Thinking in terms of read-write web, using different RDF serializations for RDF Sources seems pretty straight forward for read-write operations using IRI denoting a resource. Here one should not have any expectations for server to preserve any serializations specific artifacts (comments, prefixes, json-ld context etc.) As soon as one would want to have consistent read-write operations for different representations, where serialization specific artifacts get preserved, i think each representation should have IRI denoting that specific representation. This approach also seems applicable whenever one would want to have different representations for Non-RDF Sources.

https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types

alternate Otherwise, the link defines an alternative page, of one of these types: [...] in another format, such as a PDF (if the type attribute is set)

For the case where someone needs consistent read-write operation using IRI denoting an LDP Conainer, I think giving HTML representation more specific IRI would enable straight forward way of doing it. HTTP GET to that IRI with Accept: text/html could return Content-Location header as well as Link header with `rel="alternate"; type="text/html" could clearly advertise IRI for that specific representation.

I see it similar to example in https://www.w3.org/TR/cooluris/#r303gendocument

r303gendocument

Except that any RDF based representation of the container would not need more specific IRI. Of course as long as use case doesn't require to persist rdf serialization specific artifacts.

TallTed commented 4 years ago

The error in the Cool URIs example is in labeling the circle for application/rdf+xml as RDF instead of as RDF/XML.

It's a common blur from the early days of RDF, where RDF/XML was treated as the only serialization -- and RDF and RDF/XML were therefore treated as synonyms.

Today, text/turtle or application/ld+json would be a more likely serialization media type for such examples -- and the circle would be more likely to be labeled Turtle or JSON-LD, respectively, than simply RDF.

elf-pavlik commented 4 years ago

I agree with all the comments regarding application/rdf+xml which also unfortunately registers .rdf file extension. For solid I think of something in lines of diagram below:

solid-conneg

Where https://solid.example/team/ gets handled as RDF Source but it also has more specific IRI https://solid.example/team/page.html for Non-RDF Source representation, in this case text/html. Any read request would simply get response based on content type directly from https://solid.example/team/, while write would need to use more specific IRI of representation. Relying on Content-Location instead of redirecting would also help with using vanity URLs, person navigating we browser to https://solid.example/team/ would not even see that more specific https://solid.example/team/page.html.

To address full resource lifecycle it would require specifying interface to add that alternate relation

Link: <https://solid.example/team/page.html>; rel="alternate"; type="text/html"

I think @csarven referred in some other issue to HTTP LINK method draft https://tools.ietf.org/html/draft-snell-link-method

csarven commented 4 years ago

[Draft: I may edit to clarify.]

We have some common ground derived from several issues that can help to address this one. Here is one way:

Read, Append, Write operations on a container should go through the canonical / (slash semantics).

The handling of container representations is optional. A server may use the /index.* convention as representations of a /. They may expose the representation URL through Content-Location. A container may have RDF bearing and non-RDF bearing representations. The representations of a container may be listed in its containment triples.

ACL is set on the container and applicable to all of container's representations.

Deleting a resource removes its representations.

RDF bearing representations of a container should not update the server-managed triples.

elf-pavlik commented 4 years ago

They may expose the representation URL through Content-Location.

If server sends in response representation, which has URL denoting that specific representation, I think at least it SHOULD expose it.

Deleting a resource removes its representations.

At least for portability we very likely need to define how to express 'different representation of', I think rel="alternate"; type="text/html" might just work.

RDF bearing representations of a container should not update the server-managed triples.

LDP spec doesn't define what RDF bearing means. Could we base it on RDF Source instead? People working on implementations need to properly handle it and in solid ecosystem can rely on consistent support for both text/turtle and application/ld+json representations, plus PATCH with application/sparql-update when interacting with RDF Sources.

Read, Append, Write operations on a container should go through the canonical / (slash semantics)

I think we may need to clarify that ldp:Container as subclass of ldp:RDFSource expects supported RDFSource representations when interacting with it.

To support use case of Non-RDF Source representations like any text/html, we still need to specify how to create initial version of such representation.

We may still need to clarify what happens if more specific representations exist as RDF Sources. In that case Write operation on alt.ttl doesn't affect representation returned by GET on / with Accept: text/turtle even if /alt.ttl stays designed as rel="alternate"; type="text/turtle" of /. Only Write/Append on / affect RDF Source representation returned on Read.

A server may use the /index.* convention as representations of a /.

If we provide mechanism to denote and manage more specific representations I don't see when server would also use internal convention as above. For example if client creates /page.html as rel="alternate"; type="text/html" of /, server can't use /index.html any more.

csarven commented 4 years ago

If server sends in response representation, which has URL denoting that specific representation, I think at least it SHOULD expose it.

Please take that up in https://github.com/solid/specification/issues/109 and explain why.

At least for portability we very likely need to define how to express 'different representation of', I think rel="alternate"; type="text/html" might just work.

That has no bearing on the delete behaviour.

LDP spec doesn't define what RDF bearing means. Could we base it on RDF Source instead?

I wasn't referring to LDP. RDF 1.1's "RDF-bearing" and "RDF source" are sufficient for Solid.

I think we may need to clarify that ldp:Container as subclass of ldp:RDFSource expects supported RDFSource representations when interacting with it.

No need. We've already determined that Solid / is not "pure RDF" like in LDP.

If we provide mechanism to denote and manage more specific representations I don't see when server would also use internal convention as above.

We are not specifying the identifier of the representation.

kjetilk commented 4 years ago

We're having a gathering in Inrupt this week, so I had a chat with @timbl on this. We found that we need to take it up F2F the next time more of us meet, on the basis on what we've discussed here.

I briefly outlined my preference, which is option 3.ii.e. with some variations of the PUT behaviour above, and he did appreciated that solution, but it is easier to convey more options F2F. I will note, however, that the absence of the 303 behaviour in Solid is a deliberate design.

csarven commented 4 years ago

/ is an information resource, 303 is not applicable. 200 and Content-Location is suitable for possible use of index.*.

I'll respond to 3.ii.e. for the record. 3.ii.e. is relatively aligned with https://github.com/solid/specification/issues/69#issuecomment-576060661 but it leaves out key behaviours part of the wider mechanism:

3. "never exposed to the client" entails Content-Location or redirects are forbidden, which is a stronger constraint than RFC 7231's. A bit more relaxed version would be that Content-Location is not required but may be used (basically 7231 but that should be addressed in https://github.com/solid/specification/issues/109 in any case). In any case, we've already highlighted use cases where some implementations may want to expose representation URLs.

ii. Existing server-managed triples are assumed to be in HTML+RDFa (as per general discussion for the most part) but doesn't particularly exclude fuzzy (clumsy handling of?) RDF formats in script. It raises potential complexity in any case.

e. seems fine if intention is to append to the RDF graph, provided that existing server-managed triples are not affected or new ones added. This is rather taken as a global rule, along the lines of LDP 5.2.4.1.

I've tried to reconcile as best as I could based on things we agree on, as well as what's happening in on some implementation behaviour in https://github.com/solid/specification/issues/69#issuecomment-576060661 . I think it is important digest it as a whole. I'm not sure if we'll have consensus that's more fitting but happy to examine it along with the other options further in a F2F and/or a call.

elf-pavlik commented 4 years ago

ii) The HTML representation MUST bring along the representation of the container RDF. There are several possibilities within this.

:-1: I think this would prevent people to publish just plain HTML representation. Even if we would have take this path I think we should not require RDFa but also allow including text/turtle and application/ld+json in script tags.

I also notice that NSS by default uses data browser (unless run with --suppress-data-browser). In that case it returns plain text/html representation if requested. This representation has no intention of accepting plain text/html writes. I think we may need to properly document different use cases and requirements coming from them. Then we can clearly evaluate how different approaches can address those requirements in clean way.

kjetilk commented 4 years ago

I'll try to comment just on the behalf of myself, and since I woke up early and couldn't sleep. :-)

First, @elf-pavlik ,

I think this would prevent people to publish just plain HTML representation.

Yes, but that is I think, an important feature, as a container has, per definition, an RDF representation, which will always be significant. Moreover, it is important for Databrowser, because it can easily embed itself in the HTML document at an appropriate place if the RDF is there, which I also think is desireable. However, I think we can allow more embeddable RDF formats, it doesn't need to be constrained to RDFa.

I'm all for being use case driven, but I am concerned that we are spending too much time on this pretty edge-case feature (this will be the 79th comment), so I would much prefer to find an urgent resolution to it. I think the use case is pretty clear, people have maintained HTML representations of containers since the dawn of ages, and we don't want to break that even when there is a client-side generated view.

@csarven ,

I'm afraid you didn't capture the discussion very well, because I fundamentally disagree with the design ;-)

What I do agree upon is:

Read, Append, Write operations on a container should go through the canonical / (slash semantics).

However,

The handling of container representations is optional. A server may use the /index.* convention as representations of a /.

index.* was put outside of the design space already in December, we should not go in more circles on this, this is specifically about the behaviour of an HTML representation.

They may expose the representation URL through Content-Location. A container may have RDF bearing and non-RDF bearing representations. The representations of a container may be listed in its containment triples.

I also think this is inconsistent with the design that manipulations go through /. With this design, index.html is a resource in its own right, and then we must allow it to be manipulated as any other resource, with all the problematic side effects it has with the applicability of ACLs and possibly other metadata resources, as well as problems it might cause for queries and stuff down the road.

index.html is either a resource in its own right, or it isn't. It can't be something in between. If it is not a resource in its own right, your concerns as to additional constraints to the RFC is not applicable.

ACL is set on the container and applicable to all of container's representations.

Yes, but if you insist that index.html is something that can be referenced as a member of the container and Content-Location, it places that in a special situation. It can be solved, but it is a more complicated solution, and therefore more error prone.

RDF bearing representations of a container should not update the server-managed triples.

Yeah, but it needs to be stronger, since it just cannot update the server-managed triples, so it doesn't capture the nuances.

So, this proposal is not consistent with Trellis (which is in my 3.ii space), NSS (which doesn't manipulate through /), the design constraints we had earlier (it is only about HTML), nor internally.

csarven commented 4 years ago

index.* was put outside of the design space already in December, we should not go in more circles on this, this is specifically about the behaviour of an HTML representation.

No, once again, the original issue including the title and the comment that Dmitri created is about and I quote: "index.html and/or index.ttl". If you're only interested in focusing on the index.html bit, that's fine, but the issue still needs to address index.ttl or at least see it index. as a specialisation of https://github.com/solid/specification/issues/109 for starters - there is a reason why I've created that first so we can revisit this (also mentioned that before). "index." was used as an alias to both of those indexes (and others obviously). There are also numerous references to both (if not more) index formats elsewhere.

I also think this is inconsistent with the design that manipulations go through /. With this design, index.html is a resource in its own right, and then we must allow it to be manipulated as any other resource, with all the problematic side effects it has with the applicability of ACLs and possibly other metadata resources, as well as problems it might cause for queries and stuff down the road.

index.html is either a resource in its own right, or it isn't. It can't be something in between. If it is not a resource in its own right, your concerns as to additional constraints to the RFC is not applicable.

Yes, but if you insist that index.html is something that can be referenced as a member of the container and Content-Location, it places that in a special situation. It can be solved, but it is a more complicated solution, and therefore more error prone.

You've misunderstood. The base requirement is that interactions go through /. If however a server exposes the representation URLs (see issue 109), interactions can still go through /. That doesn't exclude index.* being their own resources... and whether a read or write can happen. Authz policy on the representations is still.. literally what's set for /. Issue 109 and issues involving ACL and representations is clear about being set on the primary resource (as opposed to the representation).

Yeah, but it needs to be stronger, since it just cannot update the server-managed triples, so it doesn't capture the nuances.

Yeah, I've proposed stuff.. please see how these relate https://github.com/solid/specification/issues/40#issuecomment-573358652 , https://github.com/solid/specification/issues/45#issuecomment-541614170 , https://github.com/solid/specification/issues/40#issuecomment-567417005 .. The repo is littered with possible ways forward.

The whole point of "RDF bearing" was if client/server deems a resource to be so, the rules on containment applies. Heck, we can even go all the way back to this: https://github.com/solid/solid-spec/issues/202#issuecomment-512902223 . However server handles / with text/html, the rest follows. It would literally allow /'s text/html representation to be treated as RDF bearing or non-RDF bearing.

the design constraints we had earlier (it is only about HTML)

Clearly you are mistaken:

https://github.com/solid/specification/issues/69#issue-497863441 questions .html , .ttl
https://github.com/solid/solid-spec/issues/134#issue-402668817 ibid
https://github.com/solid/web-access-control-spec/issues/36#issue-402670605 ibid
https://github.com/solid/solid/issues/213 ibid

Obviously not all implementations are doing the same. Some of the informal criteria that I've mentioned is what NSS does and what Trellis either does or can do. Your 3. is a non-starter based on... guess what "we've already highlighted use cases where some implementations may want to expose representation URLs." So, you can't just ignore that and still try to force your preference.

I've suggested that I can clarify and expand. I've also suggested that you should take the comment as a whole and see how it connects with the agreements made elsewhere. That wasn't an arbitrary decision and I didn't mention all that for fun.

I'm reverting the issue title until there is consensus without obvious objections or at least minimal approval from the creator of this comment.

Clearly we are talking past each other. I am as frustrated (if not more) as anyone else. We can pick this issue up in a call or a F2F :)

kjetilk commented 4 years ago

@timbl and I resolved to go for option 1. i.e. something close to the current NSS behaviour. I'll do a writeup.

csarven commented 4 years ago

Please specify diff with https://github.com/solid/specification/issues/69#issuecomment-576060661 .

kjetilk commented 4 years ago

Let me first apologize for the fast turn of the events here. It was truly not intentional: This issue is interesting as it have brought out pretty much all the tensions between the different components and philosophies of Solid, the LDP, the UNIX file system, the roles of representations on the Web, etc. However, it is also an issue quite far on the edges, and we cannot keep it open just for the undeniable intellectual exercise it provides. With more than 80 comments, I think it has been on an extensive hearing, and suddenly today, I had the rare opportunity to take it up with Tim in real life, and so I hope people aren't too annoyed if we can take that conversation as the guide. Moreover, you will quickly notice that my own favorite, the "3.ii." direction was quickly dismissed. So, here we go:

index.html is a resource in its own right, and will be manipulated as any other resource. That is, it can be read, and all update operations to the HTML representation will be done on index.html itself. index.html will be contained in the container as any other resource. It may have its own ACL that will apply when the index.html resource itself is dereferenced.

Only when a read operation is executed on / with an Accept header that indicates that a HTML representation is wanted will the contents of index.html be returned (there's some wiggle room around q factors here). It MUST then include the RDF representation of the container embedded in the HTML (the details of this was not discussed, but the idea is that the databrowser can then be embedded in the HTML). Content-Location MUST be set.

The ACLs will be applied as follows: First, the container's ACL will be applied. If the client has read access to the container, an internal redirect is made. If the index.html has its own ACL, then that too will need to indicate that read is authorized for the content to be returned.

Time ran out as we started to discuss what should be done if the client is authorized to read the container, but not the index.html. I think the natural thing to do would be to return to the Accept header to check if there are other representations (i.e. RDF) that are acceptable, if not, I would suggest a 406, but a case could also be made for always having an RDF fallback in that case, since an RDF representation of a container always exist.

The diff to @csarven's comment is that index.* is not considered. index.ttl has had a different mission in Solid and has had so for some time (it has certainly nothing to do with my preference, I have not even been aware of this before Tim told me about it, we've just basically had a collective misunderstanding around it). Most operations on / will not be affected by the presence of index.html, only Read will. The Content-Location and containment is required, the ACL algo is different.

So, this wasn't my favorite resolution, but with the clarification on how the ACLs are applied, it resolves the initial issues that prompted many of the reports on this. It also eases some tensions on the method definitions, as index.html is relevant only for read operations. I think it should be helpful to settle it.

csarven commented 4 years ago

Housekeeping: The reference to "index.ttl" in this issue should be left as a representation. The data augmentation case raised by data browser (which happens to use index.ttl) will be addressed in https://github.com/solid/specification/issues/144 .

it is also an issue quite far on the edges

The use cases that are brought up are are among the most common practices on the Web. Solid must be able to address them.

With the proposal that's brought up:

It MUST then include the RDF representation of the container embedded in the HTML (the details of this was not discussed, but the idea is that the databrowser can then be embedded in the HTML). Content-Location MUST be set.

I'll respond to the questions I've raised in https://github.com/solid/specification/issues/69#issuecomment-566974645 :

Do we expect multiple representations of a resource to be equivalent (in some reasonably defined or predictable way eg. isomorphic RDF graphs imply information equivalence)? If not, can representations be kept tracked and updated independently - is this too complex?

The proposal suggests that the representations are at the very least expected to be equivalent based on RDF graph. The proposal also suggests that they will be tracked and updated independently.

The details indeed need to be worked out:

The discussion in https://github.com/solid/specification/issues/108 shows that server interference ie. injection of containment information into HTML is not particularly practical or mature. There is also no implementation experience. Moreover, when updating non-RDFa RDF bearing resources, server interference is not expected, that is the server will either allow the request or reject. Explained further below based on existing consensus.

The intended state of / in text/html should be controllable by a client without server interference provided that the representation conforms to server-imposed constraints. Rough consensus in https://github.com/solid/specification/issues/40#issuecomment-573358652 :

The criteria from 5.2.4.1 can [in addition to PUT] be applied to POST (#108) , PATCH (#85), index.html (#69).

helps to clarify client and server expectations. That is, client needs to ensure the integrity of the containment information in an HTML with RDF bearing representation when making changes and the server verifies the request. This is the same criteria for all RDF bearing representations.

As no details are provided on equivalence or particular information persistence beyond encoded RDF graph, it can be deemed to be compatible with https://github.com/solid/specification/issues/69#issuecomment-563224708 :

If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

In https://github.com/solid/specification/issues/69#issuecomment-576060661 , I proposed a relaxed version of representation equivalence which can be determined by the agreement between a server and a client:

A container may have RDF bearing and non-RDF bearing representations. The representations of a container may be listed in its containment triples.

It meant that a representation in HTML may or may not be RDF bearing. If deemed to be RDF bearing, there are specific expectations. This is a far simpler design for both servers and clients. For instance, updating an RDF bearing representation does not entail that the non-RDF bearing representations needs to be updated for some "equivalence", and vice-versa.

To summarise suggestions (preferably selecting one from this order):

A representation being RDF bearing or not is an agreement between a server (verification) and a client (intention).
Client ensures that the representation is RDF bearing (in addition to adhering to server constraints) and server verifies.

Aside: Either of those options can allow any application (eg. data browser, dokieli) to be embedded in HTML. One of the difference between those applications may be that data browser's primary reason to have a container's representation in HTML (and the whole design on having index.html) is so that its JavaScript can be embedded and it doesn't care about the underlying information (at this time). In dokieli, resource's content matters and its JavaScript is only intended to act as a way to introduce interactions on the resource - if JavaScript is unavailable, the resource can still be expected to be human and machine-readable (useful).

Content-Location MUST be set.

No objection, however I think that should be inherited from the general case in https://github.com/solid/specification/issues/109 .

The ACLs will be applied as follows: First, the container's ACL will be applied. If the client has read access to the container, an internal redirect is made. If the index.html has its own ACL, then that too will need to indicate that read is authorized for the content to be returned.

Do I understand you correctly in that the order of this check is different to resources in general ie. if a resource in a container has its own ACL, it will be applied, otherwise, the inheritance algorithm is applied. What I'm not clear about the proposal is if index.html's ACL exists it will be applied instead of container's ACL. Can you clarify that bit?

what should be done if the client is authorized to read the container, but not the index.html. I think the natural thing to do would be to return to the Accept header to check if there are other representations (i.e. RDF) that are acceptable, if not, I would suggest a 406, but a case could also be made for always having an RDF fallback in that case, since an RDF representation of a container always exist.

I don't particularly see why index.html's ACL needs to exist (as opposed to just inheriting container's). It opens up more complications than it actually helps. While a container's representations are resources in their own right, it doesn't mean that they must have their own ACL. Simply use container's (fixed reference). What's the actual use case to be different?

kjetilk commented 4 years ago

I suggest we remove this from the FPWD as we will not be able to resolve this in foreseeable future.

kjetilk commented 4 years ago

I just want to comment very briefly on this:

injection of containment information into HTML is not particularly practical or mature. There is also no implementation experience.

That's wrong. Trellis already supports it as indicated above. We also support it trivially with Perl:

my $gen = RDF::RDFa::Generator->new;
$gen->inject_document($dom, $model);

where $model contains the RDF you wish to inject, and $dom contains the DOM object of the XHTML document you inject into. This is code that has been in production for a decade. I quickhacked a little script that takes an XHTML document on the STDIN and outputs the injected document to STDOUT last night: inject_rdfa.pl.txt

There is plenty of implementation experience, and it is very mature. I would be very surprised if it is not equally simple in JS, it is just about generating the RDF and adding it to the DOM tree, that's all there is to it.

csarven commented 4 years ago

Edit: Didn't notice your comment before sending mine, so here is a quick reply:

That's wrong. Trellis already supports it as indicated above. We also support it trivially with Perl:

It is not an arbitrary injection. Obviously triples can always be thrown in somehow. Even possible with sed but that's probably not a good idea, right? The whole document needs to be properly serialised and be coherent, and anything added or removed from the containment triples should not interfere with everything else, including ideally structure and rendering. Having code to inject is one thing but I'd like to see actual HTML documents in practice that's subject to updates.

In any case, I've noted below that this part of the update is an implementation detail.

Revisiting this.. there is more commonality in the approaches than they seem.

Appending a resource to a container:

PUT /index.html POST / Slug: index.html

It should include a triple like:

[about=""] rel="ldp:contains" href="index.html"

Following are equivalent:

GET /index.html GET /

Content-Location: index.html

[about=""] rel="ldp:contains" href="index.html"

[ RDF in HTML best practice: Don't set base URI in the HTML representation (index.html), but if set, it should end with /. This is so that the RDF graph in index.html is same as /. ]

Is this correct:

[/ or /index.html] MUST then include the RDF representation of the container embedded in the HTML.

Effectively establishes container's HTML to be RDF bearing.

Influences required RDF serializations: https://github.com/solid/specification/issues/45 ie. adding RDFa (or script with RDF) to Turtle and JSON-LD.

When a new resource is appended or removed from a container, all of its representations (including HTML) needs to incorporate the changes to the containment.

Appending another resource:

POST /

Location: foo

GET /index.html GET /

Content-Location: index.html

[about=""] rel="ldp:contains" href="index.html" rel="ldp:contains" href="foo"

How exactly a server includes the containment triples is an implementation detail. Same level of requirement for Turtle and JSON-LD.

This may be a less of an issue when /index.html is updated directly (eg. PUT /index.html) because the container's integrity falls on the server-imposed constraint ie. listing containment triples.

Server should reject if update to /index.html changes containment triples (aligned with global rule on updating containers). That entails that an RDFa parser is required. Having an RDFa parser makes it possible to serialize to other RDF.

If we treat all representations of a container as equivalent (based on underlying RDF graph), then adding new resources to a container ultimately requires an update to container's description. So the representations need to include the containment triples.

The following criteria can work for updating resources but not sufficient for appending or deleting a resource from a container:

Client ensures that the representation is RDF bearing (in addition to adhering to server constraints) and server verifies.

So, we do need the following any way:

[/ or /index.html] MUST then include the RDF representation of the container embedded in the HTML.

Having said that, there is the question of whether the HTML representation of / may be exempt from this and so a SHOULD instead of MUST. Relaxing the requirement definitely simplifies server and client implementations and allow more flexibility on / (eg. as arbitrary homepage or directory listing).. Keeping it strict means more consistency but also there is a chance that the resulting HTML is not necessarily what the client would like to see/interpret.

elf-pavlik commented 4 years ago

TL;DR: Please explain benefit motivating the requirement of embedding RDF in HTML representation of a container.

I honestly don't understand benefit of requiring HTML representation of a container to include RDF (embedded via <script> tag or RDFa). In use case of 'home page', I think user often would want to use some site generator and let it PUT / POST that HTML representing container. If any client needs RDF it can always request text/turlte or application/ld+json. No one stops users to embed RDF in HTML if they have reason for that, still requiring it would force them to regenerate the home page every time they add or remove something from that container. I think that requirement adds burden without any clear benefit, again if someone wants to embed RDF in HTML they can always choose to do it.

csarven commented 4 years ago

There is much repetition at this point. I'm responding below but it'd be great if we can continue the recurring themes and ideas in public chat, calls, F2F etc.

TL;DR: Please explain benefit motivating the requirement of embedding RDF in HTML representation of a container.

The exact same question can be asked for any RDF format. There is no difference between them if the exchange language is RDF and the sources are RDF bearing. Any one can do the job. We can also discuss why HTML+RDFa alone may suffice, and can in fact handle more use cases than the alternatives. It can be very accessible and neither would it require JavaScript to read (or use) a homepage or a directory index. So, why even bother introducing the others? As appealing or appalling as that may be to some, framing as such may not be fruitful.

Once again, if the representations are expected to encode equivalent RDF graphs, then being consistent is a reasonable design decision. The contrary is easy to raise: why should format x and y be expected equivalent but not z, especially when the resource is expected to be RDF bearing with containment information to begin with. This case is not to be conflated with a resource that's deemed to be non-RDF bearing (like an image) and then also providing an RDF-bearing representation. That's not what we want or should be practiced. The homepage case not only predates but is widely deployed than root storage or container. Hence, if the distinction between a homepage and root container is so important, then it only makes sense to leave the homepage or a directory index alone at / and simply use another URI for the root container. Certainly that's not a nice option to some people and it only complicates the situation. So, for the time being, we have to look into how to accommodate both cases - which are actually quite similar - using the same URI.

In use case of 'home page', I think user often would want to use some site generator and let it PUT / POST that HTML representing container.

I am a user. That's not what I want. I do not want to switch between applications just to update different aspects of a resource, or worse, have it switch perspectives based on a particular representation - the classic "pure RDF" and "not so pure RDF". Nothing like that is in practice or can be considered a good design. I want to be able to use an application like dokieli to update / where it works as storage root (including containment information) as well as a human and machine-readable homepage including my WebID Profile. I consider my WebID Profile in a HTML+RDFa to be canonical because that gives the most utility. I want to be able to authenticate using that WebID. If a server can't provide an RDF bearing representation of my WebID Profile or a client unable to parse an RDF bearing representation (as in RDFa), I can't authenticate. Currently, only Turtle (and JSON-LD) are acknowledged by some servers and authentication clients. We need to clarify this gap so that people are not prevented from publishing as they wish while adhering to minimal global requirements.

RubenVerborgh commented 3 years ago

Proposal for a new solution in https://github.com/solid/specification/issues/198, which considers “index” representations part of a compound state of a container (and considers the index.html and index.ttl behavior implementation-specific details).

kjetilk commented 2 years ago

clears throat Soooo, since we're not aligning here...

May I just throw out a totally breaking greenfield idea...?

/me types quickly in case anybody was about scream "NOOOOOO!"

Lets make containers server managed, but have it link to any metadata and any other data it might point to for a client to make a reasonable representation of it.

Having containers that have protected data, but also data that can be changed has caused all kinds of problems. Compound state LGTM, but probably not something we can find consensus around.

Then, a more elaborate aux resource system is under consideration, and so, it seems straightforward that a client will pull in data from various sources after GETting a container anyway. Pages built for humans by typical browser UAs tend to consist of a large number of resources anyway.

Interacting directly with it, which was @timbl 's preference can be done trivially if index.html is separate from the container. The difference between @timbl 's proposal and mine is only that previously, the server served index.html, whereas my idea is that index.html is YA auxiliary resource type (and thus isn't necessarily named index.html), is linked from the container and clients will need GET it separately. I acknowledge that this breaks existing clients.

But so much would be simpler if we just made the container server managed and told clients the resources they might want to get for a given application.

RubenVerborgh commented 2 years ago

Interacting directly with it, which was @timbl 's preference can be done trivially if index.html is separate from the container. The difference between @timbl 's proposal and mine is only that previously, the server served index.html, whereas my idea is that index.html is YA auxiliary resource type (and thus isn't necessarily named index.html), is linked from the container and clients will need GET it separately. I acknowledge that this breaks existing clients.

Could we have an exception for GET, where the representation served is the auxiliary resource? But all other interactions need to go through that separate resource?

kjetilk commented 2 years ago

Could we have an exception for GET, where the representation served is the auxiliary resource? But all other interactions need to go through that separate resource?

We could. We could also say that the container RDF needs to be brought along by injecting RDFa into the resulting representation. But is that exception really worth the bother given that UAs tend to slurp in a large number of resources anyway?

justinwb commented 2 years ago

Lets make containers server managed, but have it link to any metadata and any other data it might point to for a client to make a reasonable representation of it.

Nooooooooooooo (sorry I couldn't help it @kjetilk)

I do have to say that I'm strongly -1 on this approach at the moment, because it would have an immediate breaking impact on a lot of code (including mine), and specifications like shape trees and application interoperability. All of that said, if you provided some concrete examples of what you're proposing, specifically in cases where there is a dependency on and usage of data in the graph of the container resource, I'd be happy to look at ways to reconcile.