solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
482 stars 45 forks source link

Proposal: expose alternative query interfaces via `pim:storage` #455

Open rubensworks opened 2 years ago

rubensworks commented 2 years ago

Not sure if this is the correct location for this proposal, feel free to move this issue to https://github.com/solid/specification or elsewhere if needed

Motivation

There has been some discussions in the past around exposing query interfaces as alternative to the LDP-based interface, such as SPARQL endpoints, or TPF/QPF interfaces (https://github.com/solid/specification/issues/229, https://github.com/solid/specification/issues/227). However, up until now, there is still no agreed upon way to expose such alternative interfaces, which makes it difficult to make use of such query interfaces in the Solid ecosystem. Below, I list a simple and concrete proposal to fill this gap.

Proposed solution

The WebID spec specifies the use of pim:storage as "location(s) of the WebID owner's storage space(s)". The spec says that this should refer to the root LDP container. However, if we would lower this requirement, and make it say that it should refer to any kind of interface to access the user's pod, then this predicate could also be used to refer to an alternative interface using this predicate, such as a SPARQL endpoint or QPF interface.

Example

For example, a person deciding to expose a SPARQL endpoint and a QPF interface next to its LDP-based storage, could then do this as follows:

<http://example.org/bob/#me> pim:storage
  <http://example.org/bob/container/>,
  <http://example.org/bob/sparql>,
  <http://example.org/bob/qpf>.

A client app could consider these storages as alternatives to each other, and pick the one that is best suited for the query needs of the client.

Limitations

The only downside of this approach would be backwards-compatibility, where apps would assume that these alternative interfaces are also LDP containers. For this reason, it may be better to introduce a new predicate, such as solid:storage.

csarven commented 2 years ago

Aren't the sparql, qpf etc interfaces about the storage and so should instead be part of storage's description?

matthieubosquet commented 2 years ago

SPARQL and QPF APIs could arguably be available over any resource (as opposed to just be a Pod-Wide pim:storage querying mechanism).

I think discovering such endpoints should happen via resource-based API discovery mechanism rather than via a WebID (which could contain multiple Solid storages).

I think that maybe issue #355 is most relevant to such requirements.

rubensworks commented 2 years ago

Aren't the sparql, qpf etc interfaces about the storage and so should instead be part of storage's description?

Indeed, that might make sense.

So the example from above based on your suggested storage description could become something like this:

<http://example.org/bob/container/>
  a pim:Storage .
  dcterms:title "okieli-dokieli LDP" .
  solid:owner <http://example.org/bob/#me> .

<http://example.org/bob/sparql/>
  a pim:Storage .
  dcterms:title "okieli-dokieli SPARQL" .
  solid:owner <http://example.org/bob/#me> .

<http://example.org/bob/qpf/>
  a pim:Storage .
  dcterms:title "okieli-dokieli QPF" .
  solid:owner <http://example.org/bob/#me> .

SPARQL and QPF APIs could arguably be available over any resource (as opposed to just be a Pod-Wide pim:storage querying mechanism). I think discovering such endpoints should happen via resource-based API discovery mechanism rather than via a WebID (which could contain multiple Solid storages).

I agree, resource-level interfaces definitely make sense. But I think the discovery of resource-level interfaces may work quite different to pod-level interfaces, so I'm inclined to focus purely on pod-level interfaces here.

csarven commented 2 years ago

It may not be appropriate to interpret those interfaces as another (type of a) storage. My understanding was that they are alternative (or additional) endpoints, services, communication options offered by the storage. Would a unique property indicating those interfaces suffice? Or employing VOID, SPARQL-SERVICE-DESCRIPTION, etc?

rubensworks commented 2 years ago

It may not be appropriate to interpret those interfaces as another (type of a) storage. My understanding was that they are alternative (or additional) endpoints, services, communication options offered by the storage.

Yep, I agree with that view. With LDP also being one type of interface among these.

Would a unique property indicating those interfaces suffice?

Is it something like this you have in mind?

<http://example.org/bob/container/>
  a pim:Storage ;
  dcterms:title "okieli-dokieli" ;
  solid:owner <http://example.org/bob/#me> ;
  solid:alternativeInterface <http://example.org/bob/sparql/>, <http://example.org/bob/qpf/>.

Or employing VOID, SPARQL-SERVICE-DESCRIPTION, etc?

I think VOID and SPARQL SD definitely make sense to describe the capabilities of such interfaces, but we still need a way to discover these in the first place. In the examples above, http://example.org/bob/sparql/ would expose the SPARQL SD.

matthieubosquet commented 2 years ago

I agree, resource-level interfaces definitely make sense. But I think the discovery of resource-level interfaces may work quite different to pod-level interfaces, so I'm inclined to focus purely on pod-level interfaces here.

Maybe the discovery mechanism can be generic enough to accomodate advertising a SPARQL or qpf endpoint on any resource.

It seems to me that a description resource may be quite fitting for that purpose. It happens so that according to the Solid Protocol ED, servers MUST advertise a storage description resource (and maybe that storage description resource is the same as the resource one finds linked on the storage resource itself via the link rel="describedby" header or not, but they're both description resources).

Now, I think that there is a lot to be known about a SPARQL endpoint and that's why the SPARQL Service Description spec exists, we could probably efficiently leverage that, in the simplest manner by including a statement of the form [] sd:endpoint <https://example.com/sparql> in any description resource.

I don't mind introducing new vocabulary terms, but maybe it's nice to be more specific and have more expressivity when describing a service (qpf, sparql or solid).

csarven commented 2 years ago

Yes, along those lines. Perhaps the property is specialised for the interface so that clients can easily follow-their-nose.

I think pim:Storage is conceptually not equivalent to void:Dataset, sd:Service, dcat:Catalog, and possibly others that may appear to be.

A Storage may want to share its VoID Datasets and DatasetDescriptions, e.g., discovered through their Description Resource (via describedby link relation or https://www.w3.org/ns/iana/link-relations/relation#describedby ).

Would void:sparqlEndpoint as a property of a void:Dataset suffice? Or do you think that these interfaces should really be about pim:Storage?

Is there a property to discover a sd:Service?

Is there something in LDF/QPF that indicates an interface or is that HYDRA (search/template?)

rubensworks commented 2 years ago

Perhaps the property is specialised for the interface so that clients can easily follow-their-nose.

I don't think we necessarily MUST have interface-specific properties, if we assume that interfaces are self-descriptive after dereferencing (which is the case within SPARQL endpoints via the SPARQL SD, and TPF/QPF via Hydra). As long as you have the URL of an interface, this should be sufficient. (for example, this is sufficient for Comunica)

However, specialised properties would make sense if we want to avoid clients having to dereference those interfaces before detecting their capabilities. But this might lead us quite far, perhaps even to the point that we will have to duplicate all information in the SPARQL SD and Hydra controls for TPF/QPF.

Furthermore, since such descriptions (SPARQL SD, Hydra, ...) usually remain static, they can be cached by clients, so I don't think this would involve too much processing overhead.

elf-pavlik commented 2 years ago

It may not be appropriate to interpret those interfaces as another (type of a) storage. My understanding was that they are alternative (or additional) endpoints, services, and communication options offered by the storage.

Yep, I agree with that view. With LDP also being one type of interface among these.

:+1:

I think starting with something very generic like void:Dataset makes sense. Later adding more specific descriptions of interfaces for accessing it.

etc.

No matter how the dataset is accessed, we are still dealing with the same set of quads.

csarven commented 2 years ago

Solid flavored LDP

I don't believe that association is accurate and find it to be potentially misleading. The Solid Protocol and LDP are compatible in some aspects, however neither are an extension of the other. I suggest that we take care in how we communicate that. There are existing issues on this but we need to move some of the information towards the spec (or to another document.)