How should we handle metadata of non-RDF sources (was Treatment of .meta files)

jeff-zucker commented 5 years ago

As far as I can tell from testing, .meta files work for containers but not resources. It would seem that a .meta should function as a sidecar file for non-RDF resources e.g. for tagging photos or categorizing music. For that use case, they should be tightly bound to the file they reference - should be copied, deleted, etc. when the resource is copied/deleted. I also believe they should be (or have the option to be) listed with GET and editable. We'll be doing those things as much as we can from the client side in solid-file-client but it seems like it should be a server side thing and that the the spec should address it.

michielbdejong commented 5 years ago

I must admit I've never really understood .meta files, and I didn't (yet) implement then in IPS. I think they are used to support setting titles on containers. Maybe we don't need .meta files in the same way we need ACL docs? Do you know if they're documented somewhere?

jeff-zucker commented 5 years ago

Here is the most I've been able to find : https://github.com/solid/solid-spec/blob/master/content-representation.md#metadata

It seems to indicate something similar to what I described as a "sidecar file" for non-RDF resources. If I want to tag a photo, or categorize an mp3, I'd like to do that in a way that is strongly associated with the resource itself. Metadata files seem ideal for this. So let's say I have a container /photos including foo.png. In foo.png.meta I describe the photo in RDF. When I do a GET on /photos it lists foo.png along with its metadata as taken from its metadata file. Container .meta files work like this in NSS now. If you have /photos and in /photos/.meta you put a triple stating that the Container is a giraffe, The next time you visit /photos, the triple stating it is a giraffe will be there just as if you had put it in the turtle. That doesn't seem all that useful for containers because we can just add triples in their turtle directly. But we can't do that with non-RDF files and it would be really useful to be able to.

jeff-zucker commented 5 years ago

And an observation from experimenting with NSS: unlike .acl file .meta files do not block a container from being deleted. If the container has only a .acl file in it, we get a 409 if we try to delete. But if it has only a .meta file, the delete succeeds.

jeff-zucker commented 5 years ago

I imagine metadata files working something like this:

A container containing a single png file.

  .../photos/
      <> a ldp:BasicContainer;
          ldp:contains <foo.png>.

      <foo.png> a png:Resource.

  .../photos/foo.png
      nothing to see here, png files can't talk.

Same container with .meta files

  .../photos
      <> a ldp:BasicContainer;
          sch:title "Family Photos";
          ldp:contains <foo.png>.

      <foo.png>
          a png:Resource;
          sch:about "my last birthday".

  .../photos/.meta
      <> a meta:Resource; sch:about <./>.
      <./> sch:title "Family Photos".

 .../photos/foo.png
      still can't talk.

  .../photos/foo.png.meta
      <> a meta:Resource; sch:about <foo.png>.
      <foo.png> sch:about "my last birthday".

jeff-zucker commented 5 years ago

The above would be an app developer's dream come true compared to all the gets and link header parsing one would need to do otherwise. As I said, containers already work like this in NSS, so maybe it would not be so much work to do the same for resources.

jeff-zucker commented 5 years ago

Doing it as above rather than making the app get the same data would be save one extra GET on the container's meta and a HEAD on each item then a GET on its meta. That's a lot of hits.

csarven commented 5 years ago

https://github.com/solid/node-solid-server/issues/1040 proposes to simplify the interface for container read/write and meta handling.

jeff-zucker commented 5 years ago

Would this be a good summary of why we need metadata, how it works now, and how it could work? Is the bit about exposing triples in the container an agreed on goal? If so, app developers can depend on that regardless of how NSS or others implement it. And, If so, shouldn't it be part of the spec? If not, what is the goal and how does it provide the ability to easily find metadata for non-RDF resources?

RubenVerborgh commented 5 years ago

Although I obviously like the view realized with metadata, we shouldn't create a false dichotomy. The goal of "readable titles for non-RDF files" can be realized in several ways, one of which is .meta. The question is rather whether .meta is the desired mechanism to achieve this and the other goals.

csarven commented 5 years ago

I think with the exception of NonRDFSource case, the notion of "meta" as a distinct resource can disappear.

jeff-zucker commented 5 years ago

The goal, as I understand it is to allow non-RDF resources to be described with RDF in a way that supports them becoming part of the same universe of discourse as RDF resources which can talk for themselves. Desktop apps do this by inserting data into the resource itself (e.g. digikam with EXIF, XMP, etc. data and photos). Using .meta "sidecar" files puts the data squarly outside the resource itself which brings other problems. Has anyone proposed using headers instead? A GET gives you the resource and the metadata, a HEAD gives you the metadata without the resource, there are no extraneous sidecar resources which need to behave differently from other resources. The server can implement however it wants , giving users no access to where the metadata is actually stored, only supporting user interaction via headers.

csarven commented 5 years ago

I'm not sure if the headers is a good place for this. For navigational purposes, perhaps one can get away with Link perhaps.. and that the relation is strictly about the resource and no fragments involved. How would the fragment of a NonRDFSource be described through headers eg. video.

Edit: I also don't think it is particularly safe to expose arbitrary properties through headers. The headers is something the server should have good control over and not easily manipulated by an application.

jeff-zucker commented 5 years ago

I think there are ways around your first point, but your second point is probably a show-stopper. [edit: meaning : oh well, no, headers is not a good idea]

jeff-zucker commented 5 years ago

What if the user can send a header that says only "I am talking to the metadata, not the resource" such that a GET on foo.png with that header sends back metadata and one without sends back the resource. And a PUT/POST/PATCH on foo.png with that header edits the metadata and without the header writes to the resource itself.

jeff-zucker commented 5 years ago

In the case of the container, we could disallow modification of the container directly but allow modification of its metada by supporting PUT etc. with the metadata flag and not without.

jeff-zucker commented 5 years ago

The user would never need to know anything about where or how the server stores the metadata and would never need to address anything but the resource itself.

zenomt commented 4 years ago

What if the user can send a header that says only "I am talking to the metadata, not the resource" such that a GET on foo.png with that header sends back metadata and one without sends back the resource. And a PUT/POST/PATCH on foo.png with that header edits the metadata and without the header writes to the resource itself.

that wouldn't be very RESTy. as described, the "metadata" is more like its own resource, or at least a separate fork (especially when you say "i am talking to [...] not the resource"). the different representations that would be received or sent for the same URI wouldn't be alternate representations of the same resource, but representations of the resource or of its "metadata".

in the REST+HTTP model, true metadata about the resource+representation (like its content-type, last-modified time, creator, other systemy stuff, etc) belongs in the HTTP headers, along with a representation of the resource itself in the payload, so they're transferred atomically.

REST+HTTP doesn't have a good way to talk about resources with multiple forks.

per our conversation in gitter yesterday, when you're talking about "metadata" here, i think you're talking about both the resource (and representation) true metadata and also "more stuff i'd say in/about this resource if it was RDF". i'll talk about that in a separate message.

zenomt commented 4 years ago

@jeff-zucker and i had an extended chat in gitter yesterday on the metadata/.meta/RDF-for-non-RDF subject, from https://gitter.im/solid/chat?at=5d239e3ef5dd1457424db97d to https://gitter.im/solid/chat?at=5d23f858b0027d2b199ab085 (sadly i don't know of a good way to copy that part of the conversation to this issue).

TL;DR: my opinions on this subject:

.meta should just be an implementation detail and should not be visibly exposed over HTTP
the true metadata for a resource/representation (like last-modified time, content-type(s), (system-asserted) creator) belong in HTTP headers, because that is the REST way
things you'd like to say as RDF about a non-RDF resource (like "people tagged in this image" or "title of this image" or "my comments on this song") is not resource metadata, but adjunct/ancillary data that should be linked with the resource
the rel="describedBy" HTTP link relation probably isn't needed (because the true metadata is in the HTTP headers, and it's irrelevant to the client how or where the true metadata is stored by the server, and the client wouldn't change those metadata by altering the describedBy resource)

in addition to the true resource/representation metadata being in HTTP headers and the storage implementation being hidden, i propose the "adjunct RDF for a non-RDF resource" be handled with a Link with closer semantics:

Link: <server-specified-URI>; rel="http://www.w3.org/2000/01/rdf-schema#seeAlso"

that can be returned with an HTTP 201 Created along with a Location for a POST, and with any other successful HTTP response against the resource's URI (GET, HEAD, etc). this link would be to an RDF resource that could say anything about the original non-RDF resource. the server would maintain this link in its own way (server-specified-URI could be to anywhere, and the link itself is stored as part of the resource's true metadata, however the server implements that), creating/allocating it when the original resource was created, and cleaning it up when the original resource is deleted. it could even have its own independent ACL.

as an optimization, an HTTP/2 server could push a representation of the rdfs:seeAlso along with a successful response to a GET or HEAD for the original resource.

jeff-zucker commented 4 years ago

I agree that what @zenomt calls "true metadata" should be server-driven and belongs in the header. I also agree with him that "user generated metadata" about non-RDF does not belong in the header, rather in a separate RDF file. The link from the non-RDF resource to its user-generated-metadata should be server-generated and found in the header of the resource as in the current spec on metadata. Details about .meta or how the server actually stores do not need to be exposed to the user, other than supporting a way to edit it by following the link in the resource. I am concerned about both describedBy and seeAlso as the predicate for this link - neither carries the weight it should, which in my mind is "what the link says about the resource is as if the resource had said it." It is not the same as a random comment on an image, it has a more specific relationship with the resource, one that the server assigns and is not user alterable. Certainly the user can edit the resource the link points to if they have rights, but that does not mean that they can alter the link itself. A predicate like schema:isdefinedBy might be better. Whatever the predicate it should be described in the spec similarly to how the extended Profile talks about seeAlso - as something that client apps should be expected to follow and merge.

zenomt commented 4 years ago

on reflection and a closer reading of https://www.w3.org/TR/ldp/#ldpc-container , especially section 5.2.3.12, i think that rel="describedby" is what was intended in LDP for "where to put linked data that would be in this non-RDF resource if it was RDF".

the semantics of describedby is kinda tainted by POWDER (as far as what a "description" is) as well as LDP section 5.2.3.12's

to contain data about the newly created LDP-NR

(where "about" is problematic to me).

if everyone agrees that "where to put RDF for this LDP-NR" is what describedby is for (or was the intent, or what it should be used for in Solid), then i withdraw my rdfs:seeAlso proposal. if so, i think rel="describedby" should be explicitly documented in Solid specs as being for "where to put linked data that would be in this LDP-NR if it was RDF", and be super careful about using the word "metadata". in particular, the describedby resource isn't where you put what i described above as the "true metadata", which should be in HTTP headers and maintained invisibly in an implementation-specific way.

also, perhaps state that Solid LDP servers "SHOULD NOT" [RFC2119] create a rel="describedby" link except for non-RDF resources, which would make it clearer what that link is for.

dmitrizagidulin commented 4 years ago

This will be handled in https://github.com/solid/specification/issues/63

elf-pavlik commented 4 years ago

You may want to review this PR https://github.com/solid/data-interoperability-panel/pull/32/

solid / solid-spec

How should we handle metadata of non-RDF sources (was Treatment of .meta files) #197