solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
474 stars 42 forks source link

Specify/advise semantics of default Container resources (index.html/.ttl) #69

Open dmitrizagidulin opened 4 years ago

dmitrizagidulin commented 4 years ago

A topic that causes a lot of implementor and app-developer confusion is the handling of default resources for Containers - index.html and/or index.ttl.

Questions:

  1. Is the index.html behavior a MUST or MAY?
  2. Is the index.ttl behavior a MUST or MAY?
  3. How to prevent the fairly common occurrence of "I accidentally created an index.html but forgot to add an .acl, plus that kicks me out of the Data browser, so anyways what do I do now."

In either case, we should either add it to spec or provide non-normative text advising devs what to do.

Related issues:

csarven commented 4 years ago

There is rough consensus in https://github.com/solid/solid-spec/issues/134 that handling of LDPC resources is an implementation detail.

I've created https://github.com/solid/specification/issues/109 to provide some answers to the general issue around representation handling.

Regular Accept rule should apply (with optional q parameters) for reading. For writing, it'll fallback on the interaction rules around LDPC specified by applicable methods.

What kind of non-normative text do you think may help? If we can, I prefer to not include such text.

mikeadams1 commented 4 years ago

3. "I accidentally created an index.html but forgot to add an .acl, plus that kicks me out of the Data browser, so anyways what do I do now."

I had made a comment about creating a /public/index.html file in a pod on gitter and that it can cause problems for some users so, I wanted to give an update on how to restore your public folder if someone had done this.

Log into your pod, select "your stuff" on the drop down menu, select the "your storage" tab, locate and open the index.html file, click the gear icon and select delete, and your public page will be restored to the default view. Hope this is helpful.

csarven commented 4 years ago

Documenting a clarification on this issue (based on meeting with @dmitrizagidulin ):

If /index.* exists, what happens when / is requested? Common practice: where /index.* placed in the system and / eventually resolves with a message body including the contents of of index.*. There is some overlap with https://github.com/solid/specification/issues/109 . For example:

I think we should first have agreement on the interaction, ie. should request to /:

/index.* will inherit the authorization policies from /'s ACL (or ancestor's).

kjetilk commented 4 years ago

Seen in isolation, I feel that this issue should have been resolved without exposing the existence of index.* to the client at all, it feels much like an filesystemy anachronism, that the use of /index.* is just a way to store a representation of / internally.

The risk that people will use both / and /index.html to refer to the same resource, and therefore complicate URI Normalization is great. That is, it creates a mess for us when we try to determine if two resources are the same, and complicate query and caching.

If we do come up with a consistent way to deal with representations more generally, (#109), then it does make sense to apply that to this problem too. I think it is crucial that the minimal container triples and containment triples are in all representations, and that we minimize the risk that people will use different URIs for what is actually the same resource.

JordanShurmer commented 4 years ago

Forgive me if I'm off topic, but I think this is the same issue:

It seems to me like there are 2 types of resources in a Solid pod.

  1. Server/Pod-Provider managed things (e.g. the pod management interface)
  2. User managed things (i.e. LDP resources)

To me, it seems the problems described in this issue are caused by the lack of this distinction in NSS and in Solid in general right now. The Server managed resources should not be able to be managed by the user.

I think this paradigm of separating the User managed from the Server managed is helpful. It could even be advertised by the server. For example, servers could list their server-managed resource(s) (i.e. their "entry point" for users) when receiving a OPTION * request.

kjetilk commented 4 years ago

I discussed this briefly with @timbl , and he said that in the databrowser's use of index.ttl, there was no expectation that it is a default container resource, to the contrary, it is a normal resource, contained as usual.

kjetilk commented 4 years ago

I also discussed index.html briefly with @timbl . First, there is no expectation that all index.* will behave the same. index.ttl is merely a databrowser convention, it is just another resource.

index.html OTOH is a HTML representation of the container, as per old Apache convention. @timbl intended this to be governed by the Accept header.

There are some problems that I see with this approach (that we didn't have time to discuss further), since there are some things that are important in a container's representation, notably the containment triples, the minimal server-managed LDP triples, and the "POSIX metadata", i.e. metadata to indicate ctime, mtime, atime, etc, which would come in handy if Solid was used as an actual file system, e.g. through FUSE.

So, I am uncomfortable about not having them in the representation, even though people could get at them by adjusting the Accept header.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source, so it breaks LDP's model in which LDPC isa LDP-RS.

One possible resolution to this problem is to require the inclusion of the RDF as RDFa. Even though people might PUT index.html without the RDFa, the RDFa could be added to the DOM by the server.

I see three ways to resolve this:

  1. Live with that the HTML representation is not a representation of the container with its triples, but return HTML only if it is requested through the Accept header.
  2. Try to work out something that behaves in the browser like things used to do in the past, but does so in a way that makes it clear that ./index.html is not a representation of ./
  3. Require (or at least say SHOULD) returning the RDF representation of a container as RDFa in the HTML.

My preference would be the latter.

csarven commented 4 years ago

index.html OTOH is a HTML representation of the container

Possibly. To actually resolve that, I was hoping to get an answer to https://github.com/solid/specification/issues/69#issuecomment-554760518 , https://github.com/solid/specification/issues/109#issuecomment-549805258 eg. Content-Location?

That aside..

Any representation of a container is required to be RDF bearing; RDF Source; LDP-RS... having triples that can be parsed by an RDF parser... Full stop. Whether it has a certain number of triples or even the "right" triples is orthogonal. That is a different requirement about representations and equivalences we can work out.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source

Not necessarily. If an HTML has no RDFa in it, an RDFa parser will obviously find 0 triples. That's exactly the same as a Turtle, if there are no triples in the document. Attempting to creat or update / with text/turtle or text/html with no RDF can be expected to have the same treatment - basically 0 triples.

https://github.com/solid/specification/issues/45#issuecomment-541614170 proposes how to handle equivalent representations:

Servers supporting Turtle, JSON-LD and optionally other RDF serializations for LDP-RS SHOULD provide different RDF serializations based on client's proactive negotiation. For example, if a server allows the creation of an LDP-RS in text/html (including RDFa), it SHOULD respond to GET with Accept: text/turtle requests with 200.

kjetilk commented 4 years ago

index.html OTOH is a HTML representation of the container

Possibly. To actually resolve that, I was hoping to get an answer to #69 (comment) , #109 (comment) eg. Content-Location?

Right, so, @timbl saw this as behaving like Apache does, i.e., there's nothing on the surface, the server just silently takes the index.html file on the server and returns it's content as a representation of ./. No Content-Location or anything.

That aside..

Any representation of a container is required to be RDF bearing; RDF Source; LDP-RS... having triples that can be parsed by an RDF parser... Full stop.

That'd be the LDP purist's view, ;-) this stems from the LDP's hierarchy of interaction models, it is not a concern of Solid as currently designed.

Whether it has a certain number of triples or even the "right" triples is orthogonal. That is a different requirement about representations and equivalences we can work out.

Right, but I take the pragmatic view: What are the useful pieces of LDP that we actually need? The mentioned RDF is something we need, and from that, it follows that they must be included with the container representation.

Now, I'm sufficiently purist myself to be wary of a design that doesn't include the container representation with all possible representations, regardless of Accept header. From that, it follows that indeed, any representation is required to be RDF bearing, so we arrive at the same conclusion, but through purism on different parts of the architecture. :-)

However, neither of these views are consistent with the view that @timbl gave me of his motivation behind the index.html, we need to work out something that is.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source

Not necessarily. If an HTML has no RDFa in it, an RDFa parser will obviously find 0 triples. That's exactly the same as a Turtle, if there are no triples in the document. Attempting to creat or update / with text/turtle or text/html with no RDF can be expected to have the same treatment - basically 0 triples.

Right, but my (unstated) assumption is that the HTML would also contain content that is not represented as RDF, that's the whole point of putting HTML there, which makes it non-RDF in LDPs (flawed) model where LDP-RS and LDP-NR are mutually exclusive, regardless of whether there is RDF content hosted in it.

#45 (comment) proposes how to handle equivalent representations:

Servers supporting Turtle, JSON-LD and optionally other RDF serializations for LDP-RS SHOULD provide different RDF serializations based on client's proactive negotiation. For example, if a server allows the creation of an LDP-RS in text/html (including RDFa), it SHOULD respond to GET with Accept: text/turtle requests with 200.

No, that does not appear to address the problem, the point is that the index.html would contain content that is not represented in RDF, that's the whole point, if it was fully represented with RDF, it would not be put as HTML in the first place.

Moreover, the UNIX filesystem analogy extends to containers too, in that the container just contains resources, it is not intended to have an extensive representation on its own.

csarven commented 4 years ago

Edit/Note: I wrote the prior to seeing Kjetil's comment above / in https://github.com/solid/specification/issues/69#issuecomment-563198708

This issue was intended to work out whether index.* can be observed or interacted with.

I still think the handling of index.* is ultimately an implementation detail. If it is a container's representation with its own URL, so be it.

https://github.com/solid/specification/issues/109#issuecomment-554760602 actually tries to resolve this:

where index.* exists by other means, we need to resolve #119

The point of that was to have a clear split between resources that come to existence via Solid's prescribed interaction vs. by other means, and so re "exists by other means" ie. some implementation happens to have index.* with its own accessible URL where it didn't go through an interaction with the container, then it is of course a separate resource (and certainly not a representation of /). Out of Solid spec by default. But the point with 119 is that if there are some unique cases that should have special handling, and so that would help to address:

if /index.* exists, what happens when / is requested, we need to resolve #69

What databrowser, Apache are doing with /index.* are implementation details. So, requesting / should have nothing to do with /index.* because they are different resources. Just because Apache can be configured to handle / and processing/serving /index.* is not something that Solid needs to adopt. Again, if for example a Solid server implementation wants to accept text/html or text/turtle.. at / and store the representations at /index.*, then that's its own decision. It naturally needs to address "if /index.* exists, what happens when / is requested".

csarven commented 4 years ago

RDF 1.1 says that RDFa in markup languages qualifies it as RDF. If there is any information in there that's not part of an RDF graph, then it doesn't suddenly become non-RDF. If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

If / is intended to be in RDF, then by minimum server can take Turtle and JSON-LD (as previously roughly agreed for minimum serialisations). If a server wants to accept any other media type, it needs to be an RDF bearing document. Similarly, if it wants to serve a container in text/html, it needs to encapsulate the information in RDFa.

kjetilk commented 4 years ago

Firstly, this resource is now only about index.html as indeed all other index.* are just normal resources, and thus implementation details.

kjetilk commented 4 years ago

RDF 1.1 says that RDFa in markup languages qualifies it as RDF. If there is any information in there that's not part of an RDF graph, then it doesn't suddenly become non-RDF.

Eh, well, it makes it a not RDF source, at least, from LDP:

Linked Data Platform RDF Source (LDP-RS) An LDPR whose state is fully represented in RDF, corresponding to an RDF graph. See also the term RDF Source from [rdf11-concepts].

Linked Data Platform Non-RDF Source (LDP-NR) An LDPR whose state is not represented in RDF. For example, these can be binary or text documents that do not have useful RDF representations.

So, HTML+RDFa is an RDF Source iff it is fully represented by the hosted RDF. If it contains information that is not represented by the hosted RDF, it is neither an LDP-RS nor an LDP-NR. Then, my proposition is that the whole point with having an index.html file there is to have human-readable information that is not fully contained in RDF. It is still an LDPR though, but it is not consistent with the LDPC interaction model, which doesn't worry me too much, as long as we can represent the stuff that should be in there as RDFa.

If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

I don't think so, because what happens then if you round-trip between the HTML+RDFa and Turtle? index.html is can't be an RDF source. It is a special case that doesn't fit with LDP's flawed model. Which, BTW, is flawed for all HTML+RDFa that contains information not fully represented with RDF.

If / is intended to be in RDF, then by minimum server can take Turtle and JSON-LD (as previously roughly agreed for minimum serialisations). If a server wants to accept any other media type, it needs to be an RDF bearing document. Similarly, if it wants to serve a container in text/html, it needs to encapsulate the information in RDFa.

Right. So we agree that it is a good thing if the index.html returns RDFa, but we still have to the define this as a special case, and also whether it is a MUST in Solid to return RDFa, but we cannot do that with reference to LDP as LDP doesn't deal with this case.

csarven commented 4 years ago

The intended state of an HTML+RDFa representation is what corresponds to an RDF graph. Round-tripping is a non-issue because information that's not marked in RDFa is neither intended or expected to preserve. HTML+RDFa is an LDP-RS.

There is no need to specify index.html as a special case anymore than specifying index.ttl. I suggest to close this issue in favour of resolving https://github.com/solid/specification/issues/109

I think the original issue title reflects what's discussed and linked. I find the one you've changed to omits key information. Can we revert?

TallTed commented 4 years ago

[@csarven] information that's not marked in RDFa is neither intended or expected to preserve

Really? I don't think many if any HTML+RDFa creators would agree with you. Certainly, I would find it a very large problem if my HTML+RDFa documents were to suddenly lose all HTML content and be reduced to RDF in any serialization.

kjetilk commented 4 years ago

The intended state of an HTML+RDFa representation is what corresponds to an RDF graph. Round-tripping is a non-issue because information that's not marked in RDFa is neither intended or expected to preserve. HTML+RDFa is an LDP-RS.

No, it is not! Please comment on each of my points if you disagree!

There is no need to specify index.html as a special case anymore than specifying index.ttl. I suggest to close this issue in favour of resolving #109

I strongly disagree! This is very much a special issue, as it describes a feature that is in Solid and has been in Solid since the dawn of ages. You will then have to argue for the removal of a feature that people rely on and that the Director has voiced a clear opinion on.

I think the original issue title reflects what's discussed and linked. I find the one you've changed to omits key information. Can we revert?

Sure, but then, please be a little sensitive to the fact that some of this happened in a F2F discussion that you didn't attend to. I have tried to explain it in clear terms, but you seem to simply dismiss the discussion without considering the merits of the arguments.

csarven commented 4 years ago

@TallTed ,

Really? I don't think many if any HTML+RDFa creators would agree with you. Certainly, I would find it a very large problem if my HTML+RDFa documents were to suddenly lose all HTML content and be reduced to RDF in any serialization.

If the intention is to preserve content in the RDF graph universe of things, it needs to emit itself to get picked up. What wrote HTML+RDFa and what decisions did it make?

@kjetilk ,

Dmitri's initial comment is the original issue covering a bunch of related stuff, hence the title!

Right, so, @timbl saw this as behaving like Apache does, i.e., there's nothing on the surface, the server just silently takes the index.html file on the server and returns it's content as a representation of ./. No Content-Location or anything.

When you say "No Content-Location or anything", that can be taken as a response to https://github.com/solid/specification/issues/69#issuecomment-554760518 :

I think we should first have agreement on the interaction, ie. should request to /: include contents of index.* in the message body

being the preferred interaction from that list. [Perhaps that's another way of looking at "Please comment on each of my points if you disagree!" ;)]

Note how not requiring Content-Location in this case is contrary to our preference of requiring it for the general case https://github.com/solid/specification/issues/109 . I'd like to resolve that.

The risk that people will use both / and /index.html to refer to the same resource

The point of resolving issues like #119 is so that we understand the scope and methods in which resources make their way into a system. How did index.html materialise? Was it created as a representation of / or as a resource different than /? Contained or not? Anything referring to it?

Require (or at least say SHOULD) returning the RDF representation of a container as RDFa in the HTML.

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

csarven commented 4 years ago

So, HTML+RDFa is an RDF Source iff it is fully represented by the hosted RDF. If it contains information that is not represented by the hosted RDF, it is neither an LDP-RS nor an LDP-NR.

That's a misinterpretation of LDP-RS, LDP-NR and the RDF Source that it links to (RDF 1.1). RDF 1.1 is clear about RDFa:

An RDF document is a document that encodes an RDF graph or RDF dataset in a concrete RDF syntax, such as Turtle [TURTLE], RDFa [RDFA-PRIMER], JSON-LD [JSON-LD], or TriG [TRIG]. RDF documents enable the exchange of RDF graphs and RDF datasets between systems.

A concrete RDF syntax may offer many different ways to encode the same RDF graph or RDF dataset, for example through the use of namespace prefixes, relative IRIs, blank node identifiers, and different ordering of statements. While these aspects can have great effect on the convenience of working with the RDF document, they are not significant for its meaning.

We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph. For example, any web document that has an RDF-bearing representation may be considered an RDF source.

I think you're mistakenly overloading the term "fully". LDP-RS uses that term to differentiate from LDP-NR. Just as LDP-NR uses "do not have useful RDF". The use of the term "fully" alone is inadequate to cover all the intricacies or even to create a new constraint with a whole set of ramifications without definitions. It is not LDP's place to do that because the simplest explanation is that it respects spec orthogonality. LDP is merely classifying the kind of documents for its interaction model - the intended semantics being "RDF-bearing" or not.

Proposing that (HTML+)RDFa is somehow incompatible with, falls between, or depends on conditions (?) for RS and NR is nonsensical and renders things useless for no practical benefit.

If it helps to be sure, use the LDP-RS interaction model when communicating representations with RDFa - something I've already suggested elsewhere. If we don't need LDP's interaction models in the end, nothing fundamentally changes, so there is nothing else to do here. Great. If / is intended to have RDF-bearing representations, then servers permitting media types that could potentially contain RDFa needs to decide what is acceptable given the underlying semantics eg. involving server-managed triples.

kjetilk commented 4 years ago

This is going to go down in history as an example of why some stuff should be agreed on F2F, as we now ended up generating more heat than illumination ;-) So, we're in "violent agreement" for the most part, I guess. There's just one thing I still feel like responding to here:

Proposing that (HTML+)RDFa is somehow incompatible with, falls between, or depends on conditions (?) for RS and NR is nonsensical and renders things useless for no practical benefit.

since it does have a practical consequence, that "fully" means it is round-tripable, i.e. you can choose any RDF serialization, and the semantics will be the same.

That aside, I think we have an agreement on the following:

  1. That all operations on the container will use the container request-URI.
  2. That the actual use of index.html is an implementation detail local to the server, which will not be exposed to the client.
  3. That the use of HTML+RDFa will need to follow the same rules as any other RDF serialization, i.e. it cannot modify server-managed triples such as containment triples.
  4. That said, HTML+RDFa may contain more information than is captured in RDF, and therefore isn't round-tripable, the HTML therefore needs separate storage.
  5. The server will need to be able to modify the triples included as RDFa, since at the very least, it needs ot manage server-managed triples.
  6. It is thus not an actual "Representation URL" in the sense of #109, since it doesn't have a different URL than the container itself, it is merely an augmented representation of the same resource.
  7. Therefore, it also does not have its own ACL, etc.

Any misrepresentation in the above? If not, I think the main question is whether the HTML representation MUST have the container's triples as RDFa or if it should be a SHOULD.

csarven commented 4 years ago
  1. Agree.
  2. Agree, server is not prohibited (whether as a representation of a container or as an independent resource with that name).
  3. Agree.
  4. Disagree, only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use.
  5. Agree, server will have the same behaviour for all supported RDF serializations.
  6. Agree.
  7. Agree.

Any misrepresentation in the above? If not, I think the main question is whether the HTML representation MUST have the container's triples as RDFa or if it should be a SHOULD.

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

TallTed commented 4 years ago

[@csarven] only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use.

I could not disagree more strongly with the above, which I do not believe can be found in any RDFa specification nor guidance.

You are approaching RDFa from the wrong side. HTML+RDFa is embellished HTML (hence, HTML plus RDFa), it is not embellished RDF (which would be RDFa plus HTML).

TallTed commented 4 years ago

Regarding LDP-NR vs LDP-RS classification —

As a member of the LDP WG, I understood us to be saying that LDP-NR might include RDF content, but always include non-RDF content which is meant to be preserved, so the document must be preserved as PUT or POSTed.

LDP-RS are 100% RDF, and might be stored by the back-end in their original form, or transformed into another RDF serialization, or loaded into a graph store and not preserved as a document per se — though always retrievable in either Turtle or JSON-LD serialization.

(I was a minority in thinking that Turtle — which may include out-of-band, non-RDF comments and statement order — should be considered LDP-NR, and thus should be preserved entirely.)

csarven commented 4 years ago

HTML+RDFa is embellished HTML (hence, HTML plus RDFa), it is not embellished RDF (which would be RDFa plus HTML).

Pardon me but as accurate as that may be it is being pedantic. RDFa extends the host language, and as a result it is considered to be "RDF-bearing". I'll defer to RDF 1.1 once again.

As a member of the LDP WG, I understood us to be saying that LDP-NR might include RDF content, but always include non-RDF content which is meant to be preserved, so the document must be preserved as PUT or POSTed.

No objection.

LDP-RS are 100% RDF, and might be stored by the back-end in their original form, or transformed into another RDF serialization, or loaded into a graph store and not preserved as a document per se — though always retrievable in either Turtle or JSON-LD serialization.

No objection.

Neither of those statements conflict or exclude RDFa in markup languages. There is no test that I'm aware of that can differentiate between the strict or perceived definitions of "fully" and "do not have useful RDF" above and beyond what's generally intended through HTTP+RDF 1.1.

The arguments portraying RDFa as though it yields non RDF Source is a direct conflict with RDF 1.1. FWIW, LDP intentionally defines the concepts imprecisely, yet, what's argued here should somehow be accepted as a precise interpretation.

TallTed commented 4 years ago

"RDF-bearing" is not the same as "RDF only" which is what you're asserting with "only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use".

You're effectively asserting that the RDF content is the only thing of value in HTML+RDFa documents, and that the creators of those documents have acceded to that assertion by using that document format.

I do not believe that many, if any, and certainly not all, creators of such documents would agree.

TallTed commented 4 years ago

Also -- RDF 1.1 does not define LDP-RS, though it may well define RDF Source. These are distinct, and pretending that they are not will cause problems in many interactions.

csarven commented 4 years ago

You're effectively asserting that the RDF content is the only thing of value in HTML+RDFa documents, and that the creators of those documents have acceded to that assertion by using that document format.

That goes without saying because it is precisely what happens with any RDF reserialisation - similar to handling of whitespaces, comments, unused prefixes, order etc. that gets dropped or is rendered insignificant for the resulting RDF graph.

If a resource like / is intended to be an an RDF Source, allowed media type on that resource MUST uniformly adhere to the same criteria. Put another way, if an application wants to update an RDF Source like /, it needs to meet that requirement in its request in addition to other optional content negotiation - whether that's Turtle, JSON-LD at minimum or optionally with HTML+RDFa etc.

There is no need to assume the contrary because it moves away from the simplest explanation based on the specs.

RDF 1.1 does not define LDP-RS, though it may well define RDF Source. These are distinct, and pretending that they are not will cause problems in many interactions.

Yes, evidently, but I don't understand the point you're trying to make. What's important is that LDP-RS inherits RDF 1.1 notion of RDF Source.

Aside: concerning that LDP's interaction model may already be on thin ice with Solid any way, we could drop the LDP speak. It doesn't change the bottom line any way.

TallTed commented 4 years ago

Please reread the LDP spec (as I have just done). From 2. Terminology

Linked Data Platform Resource (LDPR) A HTTP resource whose state is represented in any way that conforms to the simple lifecycle patterns and conventions in section 4. Linked Data Platform Resources.

Linked Data Platform RDF Source (LDP-RS) An LDPR whose state is fully represented in RDF, corresponding to an RDF graph. See also the term RDF Source from [rdf11-concepts].

Linked Data Platform Non-RDF Source (LDP-NR) An LDPR whose state is not represented in RDF. For example, these can be binary or text documents that do not have useful RDF representations.

In other words -- LDP Resources have three two subsets --

  1. LDP-RS
  2. LDP-NR
  3. neither LDP-RS nor LDP-NR

HTML+RDFa docs are NOT fully represented in RDF, thus they are NOT LDP-RSs. HTML+RDFa docs do have partial representation in RDF, thus when they are LDPRs, they are LDP-NRs. not LDP-NRs. But they may still be LDPRs.

Yes, LDP references the RDF 1.1 notion of RDF Source -- but as a see also, not as a same as. Further, the linked notion is from a non-normative section of RDF 1.1 (1.5 RDF and Change over Time, a subsection of 1. Introduction), and reads --

We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph. For example, any web document that has an RDF-bearing representation may be considered an RDF source. Like all resources, RDF sources may be named with IRIs and therefore described in other RDF graphs.

Note the VERY IMPORTANT word, "informally".

You are taking this description from RDF 1.1 as both normative and formal, when it is neither.

You are also saying that these specs say something they don't — i.e., that any RDF source is only and purely RDF, and thus can be losslessly — or at least, without loss of anything of any importance — transformed from any serialization into any other. Neither the RDF nor LDP spec says this.


ETA -- I've reread down a little further. The terminology section should have had a slightly different definition for Non-RDF Source, which expansion is found in the normative section, 4.4 Non-RDF Source

LDP Non-RDF Sources may not be able to fully express their state using RDF

In other words — the state of LDP-NRs may be partially expressed in RDF, as with RDFa (and, I contend, as with Turtle.)

kjetilk commented 4 years ago

@csarven ,

Disagree, only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use.

OK, so we're drilled down now, good. Aside from the remarks of @TallTed , which I think is clarifying (I suppose I tend to read the LDP spec as the devil reads the bible, but I cannot guarantee I will stop doing it :-) ), how can we practically resolve the present issue with your interpretation?

The point is that when you add a HTML representation of the container, you do so to add information that carries HTML semantics, and it carries human-readable content, like the root of the Solid server does nowadays. If you allow roundtrip, then AFAICS, you allow information which is the very reason for this feature to exist to be lost, and that defeats the purpose of the feature. With conneg, you can always get a pure RDF representation of the same RDF that is hosted as RDFa in HTML in other serializations, but you'd still have to preserve the HTML, so it isn't roundtripable.

I don't think we should progress further down the LDP rabbit hole, we need to focus on how we can define the feature so that it does what is expected from it.

The root, e.g. https://kjetiltest2.dev.inrupt.net/ can be taken as an example. This feature is really about tightening up the spec around that, and then we need to decide if RDFa is a MUST or a SHOULD.

csarven commented 4 years ago

Ted, masterful hairsplitting =)

You are taking this description from RDF 1.1 as both normative and formal, when it is neither.

I'm completely taking it in good faith because of the fact that LDP refers to it. But if you insist and I have no objection to go with LDP's RDF Source is not "same as" RDF 1.1's RDF Source. So, what is LDP's "RDF Source"... just RDF 1.1's RDF but not "RDF Source"? A "pure" definition of RDF? constraint on RDF 1.1? Something else?

Neither the RDF nor LDP spec says this.

Again, taken in good faith because the definitions are confined to what makes it into an RDF graph. Naturally what doesn't make it in is "lost" but I'm not framing it as such. You're. I see it in terms of intentions. An HTML+RDFa document can live just fine on its own as a plain ol' HTML if you will. Given what RDFa affords in context of RDF 1.1, what is emitted is the RDF graph. If a consumer is handling it with the RDF goggles on, it is not at all about lossy. Why would a RDFa publisher ever assume that anything but the RDFy bits should remain? That is completely out of scope.

What I gather from your explanation of LDP and its interaction models is that its classification of the sources can be fuzzy. There is an element of heuristics involved by a server. Again, are there algorithms or tests that can determine whether a payload is "fully", "partially", "do not have useful", and so forth? If not, then how is interop guaranteed if servers/clients could potentially do and expect different things?

Moreover, what you describe about how some (RDF 1.1) RDF Sources could be fall into different LDP interaction models only supports the idea/possibility to not bother with LDP's interaction models for Solid. IMHO.

csarven commented 4 years ago

how can we practically resolve the present issue with your interpretation?

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

?

The point is that when you add a HTML representation of the container, you do so to add information that carries HTML semantics, and it carries human-readable content, like the root of the Solid server does nowadays.

There is a very specific expectation that's overlooked. The whole point of interacting with / and that it is an RDF Source precisely sets the constraints. The basic expectation is that it needs to a have representation in which the semantics reflect/resemble a container, containment triples etc. That's is a contract. If a server does not want to accept text/html at / or doesn't meet the same criteria about the RDF bits (as with any RDF serialization) it should reject it.

I hate to say it but we can't have our cake and eat it too on this one. We can't expect and design / (eg. a homepage) act like an regular resource (plain ol' HTML when we want it to) and at the same expect that it should be a "pure" RDF container. If HTML is allowed and the "interaction model" (but not necessarily LDP's "interaction model") on the resource is RDF, then the intention is clear. It will be handled as an RDF. If the document contains RDFa and able to cover all the container material (just like with other RDF serializations), super great!

Moreover, we can't on one hand expect that HTML can make its way into / and at the same time complain that some "content" is lost when re-serialized. As already noted, whitespaces, comments, unused prefixes, order etc. are part of the state.. but no one seems to be complaining about the "pure" RDF losing track of that.

Aside: FWIW, an experimental server like https://github.com/csarven/mayktso/ will take HTML+RDFa and store it as is. It allows conneg to other RDF. If text/html is requested, it will serve the document (which is saved as is). If text/turtle is requested, it will transform and serve that. If a Turtle document is sent to the server, it will save it as is. If text/html is requested, it will reject (because it doesn't want to serialize HTML+RDFa - although nothing in particular is stopping it from listing a bunch of triples). If text/turtle is requested, it will serve it right back out.

I don't think we should progress further down the LDP rabbit hole, we need to focus on how we can define the feature so that it does what is expected from it.

Agree. Especially involving interaction models. Especially it seems to introduce more complexity to RDF 1.1 than actually helping.

The root, e.g. https://kjetiltest2.dev.inrupt.net/ can be taken as an example. This feature is really about tightening up the spec around that

That's simply broken. Accept text/html and text/turtle return completely different "semantics". Having said that, I will entertain the idea that those are equivalent - which will open up another can of worms :)

kjetilk commented 4 years ago

You know, this is starting to sound like the Council of Chalcedon or something! ;-) And it is not even F2F.

This is not a difficult problem to solve, it is only LDP orthodoxy that prevents us from solving this issue, and that orthodoxy is entirely misplaced, since LDP is neither valid dogma nor considered axiomatic for Solid. The situation is simply a spec that didn't take into account the possible situation we're in, which is pardonable, since writing specs for every strange thing that people might do is hard.

So, if you purge the LDP-RS definition from your mind, how would you then do it?

csarven commented 4 years ago

So, if you purge the LDP-RS definition from your mind, how would you then do it?

My understanding of LDP didn't venture from RDF so what I've initially proposed is workable just the same without LDP(-RS). Pasting again for now (but I do think that we can improve the language):

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

If anyone would like to make a simpler or a more appropriate proposal, just make it so.

kjetilk commented 4 years ago

Right, but this doesn't resolve the round-trip problem, I may have misunderstood you. The point is that non-RDF content in the HTML MUST be preserved, and the understanding that I have gotten from you is that you insist it cannot.

acoburn commented 4 years ago

Is it not the case that this basically depends on whether a client identifies the resource as an LDP-RS or as an LDP-NR?

As an (RDFa-based) LDP-RS, the RDF-ness of the resource would seem to be of paramount importance. The same document (i.e. the bytes) could, alternatively, be stored as an LDP-NR, in which case the non-RDF characteristics of the resource would be paramount.

If you buy that argument, then the semantics of these resources would depend on the interaction model provided by the client when it first creates the resource. In the absence of a client-provided interaction model, the issue can be reframed as https://github.com/solid/specification/issues/128.

TallTed commented 4 years ago

@csarven -

The definition of an LDP-RS in the LDP spec is crystal clear: "An LDPR whose state is fully represented in RDF, corresponding to an RDF graph." (emphasis added)

Any other LDPR is an LDP-NR, and should be preserved inits original form -- whether or not RDF can be extracted, distilled, generated, or otherwise produced from that LDPR. Such generated RDF may be freely written to an RDF store, and transformed from one serialization to another.

LDP is not, was not, will never be, focused solely on RDF resources.

csarven commented 4 years ago

Kjetil,

Right, but this doesn't resolve the round-trip problem, I may have misunderstood you. The point is that non-RDF content in the HTML MUST be preserved, and the understanding that I have gotten from you is that you insist it cannot.

Right. The "agreement" on / is the RDF-bearing bits. If a client wants to write to /, the server only needs to check against that criteria, and only needs to transmit the RDF-bearing bits when it is requested again. If there is any expectation beyond that, then we are talking about the possibility of / being any kind of resource (not strictly RDF-bearing).

I can't stress enough the point that if a server doesn't want to handle HTML(+RDFa) because it is yucky, or concerned about not being "fully" or "round-tripable" or "pure RDF" or not pigeonholing well into LDP-RS..., there is a solution to that: 415. No hard feelings.

Aaron,

Is it not the case that this basically depends on whether a client identifies the resource as an LDP-RS or as an LDP-NR?

Right. Moreover, going with the LDP interaction model, the target resource eg. / is an LDP-RS, so the client is only allowed to provide an LDP-RS representation. If a server accepts text/html, it is definitely about the RDF graph encoded in HTML+RDFa. If a server doesn't want to accept text/html (or application/xhtml+xml, image/svg+xml..), they should return 415.

If you buy that argument, then the semantics of these resources would depend on the interaction model provided by the client when it first creates the resource. In the absence of a client-provided interaction model, the issue can be reframed as #128.

Yes, I came to the same conclusion which prompted me to open https://github.com/solid/specification/issues/105 (what 128 is based on).

Ted,

LDP-NR, and should be preserved inits original form

I don't think that's explicitly stated but I would agree with you that is an reasonably clear intention. Just as there are other intentions - some of which we seem to disagree.

LDP is not, was not, will never be, focused solely on RDF resources.

I don't think anyone argued for or against that.

kjetilk commented 4 years ago

@acoburn :

Is it not the case that this basically depends on whether a client identifies the resource as an LDP-RS or as an LDP-NR?

Yes, you could make a case for that.

As an (RDFa-based) LDP-RS, the RDF-ness of the resource would seem to be of paramount importance. The same document (i.e. the bytes) could, alternatively, be stored as an LDP-NR, in which case the non-RDF characteristics of the resource would be paramount.

Yes, if the client explicitly declares as an LDP-RS, then it isn't a unreasonable expectation is that it is fully round-tripable. However, per LDP, LDP-C isa LDP-RS, so the requirements of this issue has broken that already. :-) Moreover, @timbl said that LDP's interaction models weren't something he'd consider influencing Solid.

Back in the olden days, we used to serialize RDF graphs to HTML+RDFa on the server without adding anything, but in our day, where there is client-side code ready to do stuff with the RDF regardless of serialization, I don't see much value anymore to use RDFa as YA round-triplable RDF serialization, it is first when you add something that is not represented by RDF that HTML gets interesting.

I suppose we could find ways to support round-tripable HTML+RDFa alongside "enriched HTML+RDFa", but I consider that to be a different issue, and one that I would give very low urgency. The actual problem here is that LDP-C has a representation that is not only RDF, which is a departure from the LDP interaction model hierarchy.

kjetilk commented 4 years ago

Right. The "agreement" on / is the RDF-bearing bits. If a client wants to write to /, the server only needs to check against that criteria, and only needs to transmit the RDF-bearing bits when it is requested again. If there is any expectation beyond that, then we are talking about the possibility of / being any kind of resource (not strictly RDF-bearing).

Yes, but there is a very explicit expectation beyond that, but not as broad as "any kind of resource", it is HTML, and it must be able to carry content beyond RDF, but it may or may not contain a representation of the container RDF data.

csarven commented 4 years ago

Alternatively, requiring only RDFa would enable transmitting richer information to different consumers than any other RDF. But, we are of course not discussing things at that level. Instead, we are approaching the design problems from "RDF purity".

Curious to see what comes out.

kjetilk commented 4 years ago

Curious to see what comes out.

I'm curious to hear what you propose, @csarven , because my understanding now is that you reject the premise of issue, that the HTML has to be able to carry content beyond the RDF representation. I may misunderstand, and it may be something fundamental that I have escaped me, but we really need to get something on this...

kjetilk commented 4 years ago

Let me see if I can write something down myself. Beyond the what we do agree on (which may still be contentious, since the current implementation seems to rely on an actual index.html), the current behaviour is that it simply returns HTML if told so by conneg. This boils down to one requirement, where we can discuss the requirement level. So, something like:

A Solid Server MUST/SHOULD/MAY/MUST NOT/SHOULD NOT accept and persist a HTML representation for any container that extends beyond the RDF representation of the container.

My understanding is that this is currently a MUST for the sake of interoperability, but for something that just has to conform with the current behaviour, SHOULD or MAY would also do (in which case, the server could say 415, like you proposed, @csarven ). That is, AFAICS, the only thing required to work with the current implementation.

However, we might also want to have a representation of the RDF of the container, which could result in another requirement, something like

If a HTML representation of a container exists for a given container resource, the full representation of a container MUST/SHOULD/MAY/MUST NOT/SHOULD NOT be the container's RDF embedded in the HTML representation.

A MUST here places an addition burden on the server implementations, albeit a small one, as you can easily inject the RDFa into the HTML representation's DOM in the head element before it is sent to the client. This would take care of my concern that a representation of the container isn't really a representation without the RDF...

Perhaps we also need to say something about what happens when you do an update with PUT, PATCH or POST, or is that clear enough?

csarven commented 4 years ago

I gave my response in https://github.com/solid/specification/issues/69#issuecomment-564640467 .

A container MUST be RDF-bearing. A container MAY carry non-RDF emitting content.

Prohibiting that would entail something like, you can't have an HTML document, eg. a typical homepage, beyond listing (but not even necessarily human-visible) of the containment triples in RDFa.

csarven commented 4 years ago

Based on 2019-12-13 meting, the rough proposal is:

kjetilk commented 4 years ago

Now there is convergence! :-) So, this takes care of that the container representation always has the RDF it needs. It also ensures that if somebody makes a large HTML page, it won't be overwritten when simply adding a triple using RDF. The latter has the somewhat weird effect though, that an PUT with RDF does not replace the entire representation of the resource, but I think it is a reasonable compromise to make.

Now, we can always discuss the requirement levels (e.g. if it must be MUST, or it should be SHOULD) :-) And then, objections are also welcome!

csarven commented 4 years ago

Noting here that we need to clarify intended persistence or use specific terms to cover different scenarios. One general simplification may be by relaxing the requirement (use SHOULD or MAY) but that may or may not be sufficient, so we should take a closer look at eliminating ambiguities. If we can prescribe, great. Otherwise, the spec should at least supplement with non-normative information.

Some thoughts:

Ideally all information of significance should be encoded in RDF but the intention with that "persist" was to simply take non-RDF content into account on the basis that it is useful to retain in some cases. It is however equally legitimate and preferable to replace an entire representation when for example an author wants to redo their homepage or change the description for the kind of content they have under a particular container. Provided that it doesn't conflict with other criteria eg. URI re-use recommendations.

General considerations: Clients need to better indicate their intentions to servers, and servers should have a default behaviour. Servers that do not want to or capable of persisting the non-RDF may want to error and indicate why.

Restrictions on altering server-managed triples remain. Creates are straight-forward as there are no existing alternative representations to modify. For the purpose of updating:

kjetilk commented 4 years ago

Yeah, so short version is really now that we need to sort out all the other resource access and lifecycle stuff, and then this issue can be formulated as a special case of those.

TallTed commented 4 years ago
  • PATCH application/sparql-update isolates updates to the RDF graph, so the server may want to update representations without changing existing non-RDF content.

I should think that PATCH application/sparql-update should only be accepted when the target is pure RDF -- whether in a graph store, or in an RDF serialization with no (or, at least, minimal) out-of-band data.

Applying such a PATCH to an RDFa document, or to another HTML document with embedded "data islands" (typically in <script> elements) is fraught with peril, as the patching tool needs to understand every serialization in the existing document -- including where the RDF content may potentially repeat in embedded Turtle and JSON-LD and RDFa (and maybe more).

  • PUT text/turtle could indeed have a weird effect, so it may be permissible to not force the server to persist the non-RDF content in the other representations. After all, it is PUT and the behaviour should be uniform, but side-effects are not entirely ruled out.

HTTP spec says PUT replaces a resource in its entirety. In other words, PUT text/turtle may be understood as DELETE the old {RDFa, HTML, XLS, TXT, etc.} resource, and INSERT this Turtle resource in its place.

It may be reasonable for Solid to require that PUT text/turtle (or any PUT) be preceded by a DELETE or similar where an existing target is not pure RDF (as I expanded above), or even not already text/turtle. In other words, forbid PUT to replace targets which don't closely match what's being PUT.

Indeed, it may be reasonable to disallow PUT except where the target does not exist!

Possibly, warnings or other explicit user interaction may be sufficient here.

  • POST is a bit fuzzy - may need to resolve other issues first eg. RDF Merge? Possible to change non-RDF content?

POST is inherently fuzzy -- because an HTTP server is normally free to do all sorts of things with the content POSTed by the client, including creating a new resource (with a new name) alongside the existing (which remains unchanged); or moving the existing resource (which remains unchanged) to a new name alongside a new resource which is created with the new content; or replacing the old entirely with the new (no matter if or how they are mismatched); or rejecting the new content entirely; etc.

csarven commented 4 years ago

I should think that PATCH application/sparql-update should only be accepted when the target is pure RDF

It is intended to affect a resource URI as opposed to a representation URL.

PUT text/turtle may be understood as DELETE and [..] INSERT this Turtle resource in its place

The provided document is intended to atomically replace the representation which effectively updates the resource's RDF graph.

TallTed commented 4 years ago

It is intended to affect a resource URI as opposed to a representation URL. ... The provided document is intended to atomically replace the representation

Wait, are you replacing a representation or a resource? I think that last should be "atomically replace the resource".

And in the first above, I think you meant "affect a resource as opposed to a representation" (and not only because the effects are on the things identified/located by the URI/URL, not on the URI/URL strings themselves).

Also, I did not say anything about applying PATCH application/sparql-update to a representation, just to a target.

If an HTML+RDFa resource (which non-RDF content must be preserved as it is an LDP-NR) is targeted by a PATCH application/sparql-update with a small DELETE/INSERT of triples, what do you expect to happen? Personally, I would generally expect a 409 Conflict response, unless the server was fully capable of mapping the DELETE/UPDATE/INSERT into the RDFa (which is far more advanced than most if not all I'm aware of).

Finally, please remember that all URLs are URIs (while many URIs are not URLs). A URL might identify an LDP-RS just as well as any other URI would. Drawing false distinctions (as between "representation URL" and "resource URI") needlessly complicates already challenging discussions.

kjetilk commented 4 years ago
  • PATCH application/sparql-update isolates updates to the RDF graph, so the server may want to update representations without changing existing non-RDF content.

I should think that PATCH application/sparql-update should only be accepted when the target is pure RDF -- whether in a graph store, or in an RDF serialization with no (or, at least, minimal) out-of-band data.

OK, I can see that point. If I were to implement this, they way that I would have done it is that I would have extracted all the RDF from the document and put that in a triple store, and then, when the document was again dereferenced, I would have injected it back into the document, possibly in the head, making no attempt to preserve the structure, only the semantics. If done this way, patching with SPARQL wouldn't be a problem, but it would also not be exactly the same document.

Trying to keep the HTML intact while running an update query does sound somewhat difficult in the general case, indeed.

I think it should be a MAY requirement level thing, implementors may choose to reject SPARQL on HTML+RDFa.

HTTP spec says PUT replaces a resource in its entirety. In other words, PUT text/turtle may be understood as DELETE the old {RDFa, HTML, XLS, TXT, etc.} resource, and INSERT this Turtle resource in its place.

Yeah, there is certainly tension on this point, I think we came to a lesser evil on this point.

It may be reasonable for Solid to require that PUT text/turtle (or any PUT) be preceded by a DELETE or similar where an existing target is not pure RDF (as I expanded above), or even not already text/turtle. In other words, forbid PUT to replace targets which don't closely match what's being PUT.

Right, that is a good idea, I think. Even though it adds some complexity, it also makes the idea here even lesser of an evil.

Indeed, it may be reasonable to disallow PUT except where the target does not exist!

Hmmm, interesting... Yes, since it is a container, perhaps that makes sense.