solid / type-indexes

About Type Indexes and how they can be used by Solid developers.
https://solid.github.io/type-indexes/
MIT License
7 stars 3 forks source link

Missing predicate between `solid:TypeIndex` and `solid:TypeRegistration` #29

Open srosset81 opened 10 months ago

srosset81 commented 10 months ago

Hello,

I would like to implement Type-Indexes for the ActivityPods project. (For information, in other projects with public data, I have used VOiD endpoints and it also worked well, but it cannot be used for Pods because it uses a single /.well-known/void URL. Here's an example)

However, it seems a semantic link is missing on the current spec:

<>
  a solid:TypeIndex ;
  a solid:ListedDocument.

<#ab09fd> a solid:TypeRegistration;
  solid:forClass vcard:AddressBook;
  solid:instance </public/contacts/myPublicAddressBook.ttl>.

<#bq1r5e> a solid:TypeRegistration;
  solid:forClass bk:Bookmark;
  solid:instanceContainer </public/myBookmarks/>.

There is no predicate linking the solid:TypeIndex resource with the various solid:TypeRegistration resources. Since for ActivityPods we use a triple store for storage (and not the filesystem), each resource must have its own URI. And in the example above, we have no way to find the solid:TypeRegistration resources by reading the solid:TypeIndex resource.

A simple predicate like solid:hasTypeRegistration would solve this problem.

Thanks for reading.

jeff-zucker commented 9 months ago

The answer to this question is wider than the type indexes. Suppose I have a document of type schema:oOfferCatalog and it has a bunch of schema:Offers ... do we need to explicitly say, these offers are in this catalog? My $0.02 - from a writing point of view, yes, it's good practice to be explicit. From a reading point of view, no, it's good practice to accept things that make sense. OTOH, this might be entirely too loose an interpretation.

lecoqlibre commented 9 months ago

In my understanding, all the triples are linked by the graph (document) they belongs to (named graph).

@srosset81 If you are storing the fragment part in the TripleStore, a workaround could be to return all the triples who share the same RDF subject without the fragment?

jeff-zucker commented 9 months ago

@lecoqlibre, I wouldn't tie it to the document/graph - there could be statements with many types of subjects. But if a document says "I am a catalog of things of type X", then I would think it is safe to assume that statements whose subject is of type X are in the catalog. Things with other types of subjects would not be.

lecoqlibre commented 9 months ago

@jeff-zucker yes absolutely, the workaround won't work for statements with other subjects but at least provide a way to support a "basic" TypeIndex with a TripleStore. If we assume that a TypeIndex resource could contain other things, then adding a condition on the type will allow to retrieve only the registrations.

I wouldn't tie it to the document/graph. [...] Things with other types of subjects would not be.

Should we use the term "dataset" instead? So the other types of subjects would be part of the dataset but not part of the document?

I avoid using the term "dataset" because it collides with the concept of RDF dataset which is a collection of RDF graphs.

jeff-zucker commented 9 months ago

I am in favor of the proposal to have a specific predicate. I wouldn't consider an app broken that was forgiving and assumed that link, but it is better to have the link explicit.

lecoqlibre commented 9 months ago

OK just to be clear @jeff-zucker you would be in favor to add a predicate saying that the TypeIndex document contains things of type solid:TypeRegistration? So you are not saying that the TypeIndex should link to all of its registrations, right?

Because I think @srosset81 was suggesting to add a solid:hasTypeRegistration linking to all the solid:TypeRegistration it has like:

<>
  a solid:TypeIndex ;
  a solid:ListedDocument;
  ex:hasType solid:TypeRegistration; # jeff-zucker's proposal?
  solid:hasTypeRegistration <#ab09fd>, <#bq1r5e>. # srosset's proposal?

<#ab09fd> a solid:TypeRegistration;
  solid:forClass vcard:AddressBook;
  solid:instance </public/contacts/myPublicAddressBook.ttl>.

<#bq1r5e> a solid:TypeRegistration;
  solid:forClass bk:Bookmark;
  solid:instanceContainer </public/myBookmarks/>.
jeff-zucker commented 9 months ago

I'm in favor of @srosset81's proposal to link all the items as you show as a recommendation to developers whose apps are creating or adding to a type index. The question is what does one do with a type index that doesn't have that predicate (such as every type index in existence at the moment). In terms of reading, apps can, I think, make the assumption that if there is a thing of type solid:TypeRegistration in a document labeled as a solid:TypeIndex that the solid:hasTypeRegistration relationship can be inferred. There is also the question of a "correcting app" - one that can recognize a broken typeIndex and fix it. In order to fix it, the app would have to make that assumption in order to add the hasTypeIndex triples.

jeff-zucker commented 9 months ago

My comment above speaks in terms of documents but the same applies to graphs. If the graph is defined as a collection/list of things of type X, and there are things of type X in the graph, readers of the graph can assume that those things are in the collection/list. Perhaps I am being too lenient.

lecoqlibre commented 8 months ago

I'm in favor of @srosset81's proposal to link all the items as you show as a recommendation

@jeff-zucker I think a recommendation won't be enough for implementations like the one of @srosset81. Like you said, with a recommendation, some TypeIndexes would not contain the solid:hasTypeRegistration predicate. Like SemApps (ActivityPods is based on SemApps) does not support graphs containing multiple subjects it just can't:

make the assumption that if there is a thing of type solid:TypeRegistration in a document labeled as a solid:TypeIndex that the solid:hasTypeRegistration relationship can be inferred.

In other words, SemApps can't read nor write graphs with multiple subjects inside. It does not support quads. There is no possibility to link triples to other triples except by making the links explicit. I give more details in the issue https://github.com/solid/specification/issues/610.

So currently you can't POST nor PUT to a SemApps server a TypeIndex graph containing registrations at a time. As a workaround you could instead POST/PUT the solid:TypeIndex resource and POST/PUT each solid:TypeRegistration resources in different requests.

But even if you have managed to store the complete TypeIndex in the SemApps server (TripleStore) you won't be able to retrieve it! Because there is no way for the server to know which solid:TypeRegistration belong to the solid:TypeIndex.

I can see different options at this point:

  1. Like said before, SemApps could exploit the fragment part of the URL to link the triples.
  2. The TypeIndex specification should say that a solid:TypeIndex MUST declare a solid:hasTypeRegistration predicate for each registration of the TypeIndex.
  3. SemApps servers should handle multiple subjects in requests, also recognize a request is about a TypeIndex and add the solid:hasTypeRegistration predicate by itself.
  4. SemApps servers should handle quads.

If we develop option 2. all the other specifications we might encounter should adopt the same behavior and declare explicit links between triples otherwise SemApps like servers won't work. Also, declaring explicit links would make the resource heavier and more difficult to create. Even if it could be easier to read, some applications might want to check the TypeIndex is complete so it would read the whole document anyway (might be a recommendation to do this).

If we develop option 3. SemApps would have to implement a dedicated handler for any specification that is using implicit links. That makes a huge amount of work! Pretty impossible to do if SemApps want to be generic.

I would be curious of other opinions, from the TypeIndex team for instance. There could be some other reasons why the TypeIndex is making implicit links.

jeff-zucker commented 8 months ago

I wouldn't object to hasTypeRegistration being a MUST for writing a typeIndex. Since there are thousands of existing typeIndexes without it, I would also like the spec to recommend that clients be lenient when reading and make the assumption that all typeRegistrations in a typeIndex are part of that typeIndex whether or not that is explicitly stated. This would work fine on both a SemApps style that had the hasTypeRegistration statements and on one that didn't.

As you've pointed out, there is still the bigger question of whether Solid should require support for quads. It's probably been discussed before (@csarven ?)

pchampin commented 8 months ago

Since for ActivityPods we use a triple store for storage (and not the filesystem), each resource must have its own URI.

All triples stores that I know of are in fact quad stores, they support named graphs, and you could store the triples of each LDP RDF Source into a different graph. This would give you the same flexibility than storing them as separate files in the file system.

srosset81 commented 6 months ago

Sorry for the late reply, unfortunately I don't have as much as time I would like to have to follow-up on GitHub discussions :-/

All triples stores that I know of are in fact quad stores, they support named graphs, and you could store the triples of each LDP RDF Source into a different graph. This would give you the same flexibility than storing them as separate files in the file system.

This is unfortunately not the case with Jena Fuseki 3.17, with which SemApps is "stuck" because we developed an extension to check WAC permissions. With this version of Fuseki, you can create as many named graphs as you want, but only those which are set in the configuration files are persisted (I don't know if they fixed this in later versions).

Even without full quad store support, SemApps could be improved to support hashed URI. It would be some work (for which we don't have resources at the moment) but it's not impossible.

However my problem is more philosophical than technical. I understand that, with a filesystem storage, it's more convenient to put everything on the same file. And that it's less work not to indicate the hashes of the TypeRegistrations. But my question is: why enforce all TypeRegistrations to be on the same file ? For me, the philosophy of Linked Data is that you can link (and dereference) data no matter where they are. So why should they absolutely be on the same file ? Why not allow servers to put them on separate resources, if they wish to ? A simple predicate like I'm suggesting above would solve this problem, and make TypeIndexes more flexible.

Concerning backward compatibility, I think that clients can be intelligent enough to keep on parsing TypeIndexes without a solid:hasTypeRegistration predicate. But IMO servers implementing TypeIndexes should use this new predicate.

jeff-zucker commented 5 months ago

@srosset81 - if we add the ability to say something like <> a solid:TypeIndex solid:hasTypeRegistration :A, :B, :C., that adds the complication of parsing a list (or a Collection if we went that way). Would it be sufficient for your purposes to use the inverse instead. In other words :A a solid:TypeRegistration; solid:inTypeIndex <>; ...?

srosset81 commented 5 months ago

@srosset81 - if we add the ability to say something like <> a solid:TypeIndex solid:hasTypeRegistration :A, :B, :C., that adds the complication of parsing a list (or a Collection if we went that way). Would it be sufficient for your purposes to use the inverse instead. In other words :A a solid:TypeRegistration; solid:inTypeIndex <>; ...?

This is the exact discussion we had with elf Pavlik here: https://github.com/solid/data-interoperability-panel/issues/323 As I wrote there, it feels more straightforward to have a "container describing what it contains, instead of the contained children describing what container they are inside". This is how it is done for LDP containers, ActivityStreams collections and WAC groups.

csarven commented 5 months ago

The predicate referring to a TypeRegistration is not "missing" or a bug per se but rather can further support disocvery - follow your nose type of exploration.

With that in mind, I'd like to point out that discovery of TypeIndexes start from a selected WebID (found in a WebID Profile Document) as per https://solid.github.io/type-indexes/#type-indexes , https://solid.github.io/webid-profile/#discovery .

So, the consumer discovers TypeRegistrations from a Type Index Document. The document is the context or the graph.

It is undoubtedly possible to find TypeIndexRegistrations as is. The consumer needs to find the triple pattern (typically by processing all triples that match a solid:TypeRegistration).

As I see it, finding such triple patterns is not conflicting with alternative methods, such as following the predicate of the subject that is of type TypeIndex with object of the instance of a TypeRegistration

This can help implementations that are using RDF libraries by navigating through the properties from WebIDs to TypeRegistrations.

If a predicate is introduced to the Solid Terms namespace, I'd suggest to consider similar scenarios in other Solid specifications and to come up with a generic term - which is to say, besides checking reuse of existing term out there along the lines of "hasPart", "contains", "includes", "items", and so forth.

Futhermore, it may be necessary to incorporate it as part of the Type Indexes Data Model (as opposed to being optional) in order for publishing and applications to be interoperable going forward. Not requiring would potentially mean that consuming implementations that are relying on the property will not be able to discover the TypeRegistrations or at least still need to the current expectation. Which is not entirely bad. So, there is a transition period and implementations should be advised to update/upgrade.

On a related note, a similar request was made in WAC: https://github.com/solid/web-access-control-spec/issues/19 (and possibly elsewhere). (I'd not suggest WAC to use a term from the Solid Terms namespace if it were to adapt property-based discovery from the ACL Resource to each Authorization instance).

jeff-zucker commented 5 months ago

So, assuming that we want some kind of predicate, there are several options.

  1. An inverse property - :A a solid:TypeRegistration; ex:inIndex <>.
  2. Multiple objects - <> a solid:TypeIndex; ex:hasTypeRegistration :A, :B, :C.
  3. A Collection - <> a solid:TypeIndex; ex:hasTypeRegistrations ( :A :B :C ).

Or, to take @csarven's suggestion, more generic :

  1. An inverse property - :A a solid:TypeRegistration; ex:includedIn <>.
  2. Multiple objects - <> a solid:TypeIndex; ex:includes :A, :B, :C.
  3. A Collection - <> a solid:TypeIndex; ex:includes ( :A :B :C ) .

My preference order is the order shown.