w3c / ldn

🔔 Linked Data Notifications
https://www.w3.org/TR/ldn/
38 stars 18 forks source link

unclarity in ¶3.1 "discovery" on finding a link in GET #54

Closed bblfish closed 7 years ago

bblfish commented 7 years ago

make an HTTP GET request on the target URL and, for RDF sources [RDF 1.1], the Inbox is the object of the predicate http://www.w3.org/ns/ldp#inbox.

This and the previous bullet point seem to be saying

look at the header or the body for a link to the inbox.

The second bullet point is missing a reference to the returned "representation" and its semantics, so that it is not clear what is going on.

bblfish commented 7 years ago

One could be very clear, and general by speaking of well known interpretations of the given representation as an RDF graph, and finding the link in that graph.

rhiaro commented 7 years ago

Is this better for the second bullet:

  • make an HTTP GET request on the target URL to retreive an RDF representation [RDF 1.1]; the Inbox is the object of the triple where the subject is the target and the predicate is http://www.w3.org/ns/ldp#inbox.
  • bblfish commented 7 years ago

    I think you mean the "object" is the target. (This leaves the question as to what the subject should be. Is that covered somewhere?)

    You should still speak of the RDF Graph interpretation of the representation as containing the triple.

    • make an HTTP GET request on the target URL to retrieve a representation, whose RDF [RDF 1.1] interpretation is a graph that contains a triple whose predicate is http://www.w3.org/ns/ldp#inbox, and whose object is the target.

    That allows you to cover any future serialisation fashions including comma separated values, ...

    csarven commented 7 years ago

    No, target is always the subject.

    I see what you are saying with "RDF Graph interpretation", but isn't that already implied? The resolved URL is naturally an RDF graph (or an ldp:RDFSource).

    Is calling out on "representation" important here? How is the original "RDF sources [RDF 1.1]" misleading or unclear as to what that is?

    bblfish commented 7 years ago

    I see what you are saying with "RDF Graph interpretation", but isn't that already implied?

    in @rhiaro's proposed text she talks of retrieving "an RDF representation", but in the end pretty much any data in any representation can be interpreted as RDF, as all information can be expressed as relations between things. So e.g. GRDDL shows how any XML can be interpreted as RDF. Comma separated values can also be interpreted as RDF, and there is a WG on that. Furthermore, in the case of RDFa there is no specific mime type for it (as far as I know), so the interpretation is something the client seems to have to guess.

    It is, therefore, more general to speak of the returned representation and then its RDF interpretation as a graph (including graphs containing graphs - is there a special term for that?). This allows you then to correctly place where the triple should be found: namely in the graph. Otherwise, people are thinking you need to search each syntax for a triple, which indeed seems completely infeasible.

    Is calling out on "representation" important here? How is the original "RDF sources [RDF 1.1]" misleading or unclear as to what that is?

    I think the above reply also covers that. There is a bit of a jump in the proposed text from a representation to finding a triple. But a triple is always found in a graph.

    No, target is always the subject.

    Ah ok. Well kind of. This brings up another issue. This is true, except for cases where the target is an object described in the document it seems, as with "DISCOVERY EXAMPLE 6: TURTLE"

    csarven commented 7 years ago

    If I'm not mistaken, the general consensus as to what's is an RDF representation is one of the RDF syntaxes endorsed by W3C i.e., the official RDF/XML, RDFa, Turtle, N-Triples, RDF/XML, JSON-LD. I'm not sure if we have to go beyond that as to what could be interpreted as RDF. The client that's looking for RDFa doesn't have to rely on the mimetype. In fact, it could occur in resources with Content-Type text/html, application/xhtml+xml or even image/svg+xml. There may be others which escapes me at the moment. RDFa is not interpreted (unlike CSV). There are clear processing rules for RDFa. For example, if an RDF implementation sends an Accept header with text/html, it is clearly looking for RDFa in the response.

    LDN is not setting a constraint on the representation (or even the "interpretation"). By using the terms s,p,o, it is clear about what's what regardless of the serialisation that's returned. If/when a new RDF syntax emerges, it implies that it follows at least the spo model and so LDN will still work since we agree on the same spo.

    I'm okay to update the wording if you feel that there is a jump from the representation to the triple in the graph. I guess the representation is just the raw bits, and the graph is what we have after parsing.. "interpreting". But still, isn't the definition of RDF source covering exactly that:

    We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph. For example, any web document that has an RDF-bearing representation may be considered an RDF source. Like all resources, RDF sources may be named with IRIs and therefore described in other RDF graphs.

    https://www.w3.org/TR/rdf11-concepts/#dfn-rdf-source

    What the original text is saying is that, if you do get an RDF source (this is obviously after the raw representation), look for this specific triple.

    As for example 6 and "target", http://csarven.ca/#i is the target. The discoverer identifies http://csarven.ca/#i of interest (by whatever means) to find the inbox for, and proceeds with GET representation on http://csarven.ca/, creates internal graph, and finds the triple matching (http://csarven.ca/#i, http://www.w3.org/ldp#inbox, o) . o is the Inbox.

    Just to be clear and to minimise confusion over the terms, 'target' is just a resource of interest that we want to discover its inbox for.

    What do you think?

    bblfish commented 7 years ago

    (Sorry to have taken time to respond. I moved recently to work as an OSX user that does not have access to any social networks or mail so as not to be constantly sidetracked.)

    There are clear processing rules for RDFa. For example, if an RDF implementation sends an Accept header with text/html, it is clearly looking for RDFa in the response.

    How does a client know that it is speaking to an RDF Implementation? The only way to tell is via a mime type. All that a client knows is what goes over the wire, hence the importance of protocol. Now one can argue that RDFa is now part of the w3c standards and so is part of html. But a client that asks for html cannot assume that the server has the capacity to serve RDFa. (Unless perhaps the returned header specifies that the resource specifies that it is of type ldp:BasicContainer or ldp:Resource....) That is why my SoLiD clients always ask for HTML with a much lower priority than any other RDF representation.

    But I think this is sidetracking us from the main issue here.

    LDN is not setting a constraint on the representation (or even the "interpretation"). By using the terms s,p,o, it is clear about what's what regardless of the serialisation that's returned. If/when a new RDF syntax emerges, it implies that it follows at least the spo model and so LDN will still work since we agree on the same spo.

    Perhaps "encoding" would be a better term to use over interpretation, as "interpretation" is the relation between the word and the world (the semantics). "encoding" seems to be what the RDF 1.1 concepts document uses.

    Here is my suggested text then:

    • make an HTTP GET request on the target URL to retrieve an RDF representation [RDF 1.1], whose encoded RDF graph (data set?) contains a relation of type http://www.w3.org/ns/ldp#inbox. The subject of that relation is target and the object is the inbox.

    (Regarding your comment on RDF source I don't think that will do as that is the type of a resource that can change over time, whereas an rdf graph is immutable. An RDF Source a set of graphs over time. We can't search that in the same way we can search a graph that is encoded in an RDF represetnation)

    csarven commented 7 years ago

    Integrated your suggested text in https://github.com/w3c/ldn/commit/5f4f16f53ad3c11d03429c667afe515841d8b809 . Sounds good to me. Thanks! If you are okay with this, please close this issue at your discretion.

    [Stating the following as part of an ongoing conversation, not particularly relevant to this issue as you agree:

    RDFa is a W3C recommendation to express structured data in markup languages (HTML being one of them, SVG is another..). This is not something up for debate or "one to argue" with, but a mere fact. As I've said, if a client RDF implementation is asking for text/html, it is signalling what it is interested in, and it is in all likelihood (i.e., if working within established parameters) it is capable of processing the RDFa parts. I obviously agree with you on "a client that asks for html cannot assume that the server has the capacity to serve RDFa."

    When you say Solid, I presume you mean the actual server implementations (i.e., gold, node-solid-server at this time). In any case, they pick and choose parts of W3C/IETF recommendations, which is fine in and of itself. However, I don't feel that basing our decision process on what it does and what it doesn't do is a sound way of addressing our concerns generally. With respect to RDFa, Solid implementations don't have a serializer for HTML+RDFa to handle text/html responses. That's pretty much the prime reason (at least until something along the lines of https://github.com/solid/node-solid-server/issues/414 is resolved). The order of the acceptable mimetypes is orthogonal. If an implementation can't express the content using RDFa, that's its calling. The only nitpick there is that the HTML representation is not "equivalent" to the RDF representations like Turtle, RDF/XML etc, at least in terms of structured data, and machine-readability. If an implementation wants to go ahead and have an opinion about how it wants to deal with HTML (with or without RDFa), that's fine. However, that's by in no means a justification for how it ought to be everywhere else. If an implementation is unable to provide machine-readable equivalents to each representation in the Content-Type response, it probably should omit those that are not "equivalent".

    bblfish commented 7 years ago

    ok for the improved text.

    bblfish commented 7 years ago

    I have my own solid implementation at https://github.com/read-write-web/rww-play which is an evolution of the initial one I helped Alexandre Bertails write before even the LDP wg got started. I am about to rewrite it again to make it more efficient.

    On the client side, I have written clients like https://github.com/read-write-web/rww-scala-js .

    It is from the client perspective that I am talking when deciding to set priorities on pure RDF mime types. As a client, I have little way of knowing ahead of time that a resource is going to be RDF ahead of time. (except as mention that I could know I am in a ldp:Container and its ldp:contents are rdf Sources) .

    RDFa is great but it is a whole lot more difficult to get people to implement correctly on the server as it mixes formatting and content, which means that it needs data scientists to work very closely together with artist/designers. Most companies would find it very difficult to have those two groups come together. (Just from my experience working at AltaVista). Tying both together automatically would also be very difficult to do.

    From the point of view of the client, there is no way for it to say: give me html but only if it contains RDFa. And if the client is actually building its UI based on the data then that is an important limitation, hence why I have html at a lower priority.

    csarven commented 7 years ago

    "RDFa mixing formatting and content" is a bit of a misnomer. From the point of obtaining an RDF graph, it can express "pure content" i.e., all the triples we get in the end can be essentially exactly as other RDF formats. It all depends on the type of content. What RDFa gives us additionally is that, prose content can be expressed in the same document for both humans and machines. What's the sane way of representing prose content in non-RDFa RDF formats? Include arbitrary HTML markup, or Markdown or something else in the object literal, and then run an additional processing on top to stitch it all together on the client side, meanwhile ending up with at least two URIs 1) the "pure content" (in say Turtle) and 2) application that uses 1), and not to mention all the boatload of dependencies that needs to be carried around, and be dependent on a JavaScript/imperative-Web. The original design of the Web, and what partly due to its success was because of it being declarative.

    I'm not suggesting that one approach is ultimately better than the other, but that these are just design decisions. If your client side application wants to handle several URLs and then produce an HTML in the end (without RDFa), that's fine, and there are perfectly good UCs for it. When a server publishes HTML+RDFa, what it is saying is that, 1) you can obtain an RDF graph from this (just like its alternative representations), 2) you can use the default/out-of-the-box human representation for it that the server or the creator of the content intended. The creator of the content has a particular way that it wants to express that information, and so it comes with a default HTML that one can use. No one is forced to use 2), your implementation can still take 2) and generate a different HTML to fulfil its own needs.

    I agree that it is challenging for servers to get RDFa serialization right, but it doesn't make RDFa itself faulty or less interesting or useful than the other formats. I think it essentially depends on the type of content we are dealing with. It makes sense for prose or mixed content and "formatting" (virtually the most common case on the Web today - and one of the reasons why I think https://github.com/linkeddata/dokieli is important explore, if not use) to go out as RDFa. For granular data items which don't need to have a particular/default UI, it might make more sense to use alternative formats.

    We are having this discussion because of the challenge in having proper and flexible templates to produce RDFa on the server-side. For controlled content, this is not an unsolved problem. For instance, RDF stores + HTML templates work well enough. What I see is happening with server implementations that don't grok RDFa is, they pick different formats to publish, and then completely punt the problem to the client-side. Not to mention that each and every single one of those clients has to address the same problem i.e., building UI out of an RDF graph in the end. In majority of the cases, none of which are even remotely interoperable, interchangeable, discoverable. Perhaps that's not desired, but we can put that aside for now.

    I think the creator of the content at least has a vague (if not complete) idea about how the content should or can be viewed, used, interacted, i.e., the affordances it should signal. While some of that is expressed through the choice of vocabularies/ontologies, and what goes in the RDF graph, there is information that's not "captured". That's where the human-readable formatting comes in for some types of content since they are shipped together.

    From the point of view of the client, there is no way for it to say: give me html but only if it contains RDFa. And if the client is actually building its UI based on the data then that is an important limitation, hence why I have html at a lower priority.

    Servers probably shouldn't return text/html for a resource meanwhile returning other RDF mimetypes. We don't return image formats along with RDF mimetypes on the same resource, right? But, we try to return equivalent representations. [I don't want to get too deep here but maybe it can be argued that an image is one representation, whereas RDF Turtle is another for a given resource.] Then the client knows that there is no HTML+RDFa to begin with. If the client is designed to only cope with HTML+RDFa, well, so be it. They lose out or maybe they don't care. If on the other hand, the server is capable of returning an HTML+RDFa representation, clients that look for and know what to do with it (building an UI) can handle it, meanwhile clients that don't speak RDF but only HTML can still get something that's human-friendly.