solid / user-stories

A repository to submit user stories
MIT License
29 stars 18 forks source link

As a user I want to see the collection of my photos from Spain, for 2019, that were from work trips so I can create an album for the company blog. #38

Open pmcb55 opened 5 years ago

pmcb55 commented 5 years ago

The SPARQL query can be expressed as follows regardless:

Select ?date ?image where {
  ?image a schema:imageObject.
  ?image schema:date ?date.
  FILTER(YEAR(?date) = 2019).
 ?image schema:hasLocation wikidata:Spain .
# ?image schema:takenBy <myWedID> . (do I) .
}

Really naive approach could try and execute this query without a seed (or source, where the source is not specific in that it refers to a domain name, and crawling won't go outside that domain, so more constrained). If given a seed, it will crawl and traverse the entire Web potentially. If not given either it looks at all IRI's in the

GET WebID <= Follow link(s) to you public/private storage <= Crawl your entire Pod, but will not follow links referencing outside your Pod (by comparing IRI domains, since the original query states 'photos in my Pod'), execute query as we crawl. Each response is parsed as triples, triples which can contribute to the query answer are kept (but one photo might only match 3 of the above 4 query patterns) - we need to keep them 'cos a subsequent resource (not a photo at all perhaps) states for that photo the 4th query pattern.

We also need to maintain a record of all IRI's visited to prevent infinite loops....

pmcb55 commented 5 years ago
  1. Server has TPF endpoint

Here the client simply processes the query on it's side (following the TPF algorithm) and hits that endpoint. The idea is that each Pod exposes a TPF endpoint. The Pod profile could itself could contain the hypermedia controls for TPF.

(SPARQL interface is an endpoint, but I fire 'commands' at it and it acts like an RPC. TPF is different in that it's a collection of resource. TPF has no endpoint - it just has fragments. But the response is an RDF document made up of data, meta-data and controls.

  1. SPARQL endpoint is also an options.

  2. Shape-specific interface - in-between SPARQL and TPF - so the client breaks down the query into shapes it's understands, and fires these to the server.

kjetilk commented 5 years ago

Indeed, SPARQL relies on an out-of-band language definition, and therefore cannot be RESTful, whereas TPF has a complete in-band definition of the interface. That's a strength of TPF, but quite inevitable for something as complex as SPARQL.

Let me braindump something I have been thinking about SPARQL endpoints and quad patterns on different levels. One key problem with SPARQL endpoints is that it is so expressive you can do nasty things with it to the server, like a DoS, and there are open research question around it. One approach could be merely to limit the amount of data an engine would query.

So, potentially, every resource could be a SPARQL endpoint... Quite simply, you'd just let the engine query over the data described in the information resource, and that would be it, then, we might have so small data that it might be OK. The resource would simply be able to respond to SPARQL media types.

Next, I prefer to think about this in terms of the RDF Dataset section of the SPARQL spec. In such a situation, the information resource would be the default graph, and this could also be the assumption of a client. You could also use the RDF Dataset FROM and FROM NAMED to include other resources in the RDF Dataset, and those could be implemented on the client too.

IOW, we use the RDF Dataset stuff from the SPARQL spec within a pod to control what parts of the pod that are used in the query. That can be fully compatible with query answering done on the client too over e.g. TPF (where the graph name is the part without query params, barring possible URI aliasing problems), if the RDF Dataset definition is given in the Solid spec. The spec could define what is the default graph of a certain endpoint and what resources may be added to the RDF Dataset using the named graphs. Naturally, that would be limited to the pod, so that if you go beyond the pod, you would need federation mechanisms like the SERVICE or more advanced source selection.

The next step is that each container is a SPARQL endpoint, where the resources it contains is merged to be the default graph. Again, you could do things with named graphs.

Then, the whole pod could have a SPARQL endpoint, where the default graph is the RDF merge of all the data. But then, we might run into problems because of the expressivity.

I think we should be looking into what we can do with named graphs, not only to limit what the endpoint has to consider in query evaluation, but also for partitioning. Now, since anyone can say anything about anything, we might not want to allow them to say anything on your pod, and if they did, and there's conflicting information, you might want to be able to partition the data to include only data we have verified for a certain application.

Mitzi-Laszlo commented 5 years ago

Added to user-story readme

megoth commented 4 years ago

Opening this as part of moving all user stories into issues again.