Open pmcb55 opened 5 years ago
Here the client simply processes the query on it's side (following the TPF algorithm) and hits that endpoint. The idea is that each Pod exposes a TPF endpoint. The Pod profile could itself could contain the hypermedia controls for TPF.
(SPARQL interface is an endpoint, but I fire 'commands' at it and it acts like an RPC. TPF is different in that it's a collection of resource. TPF has no endpoint - it just has fragments. But the response is an RDF document made up of data, meta-data and controls.
SPARQL endpoint is also an options.
Shape-specific interface - in-between SPARQL and TPF - so the client breaks down the query into shapes it's understands, and fires these to the server.
Indeed, SPARQL relies on an out-of-band language definition, and therefore cannot be RESTful, whereas TPF has a complete in-band definition of the interface. That's a strength of TPF, but quite inevitable for something as complex as SPARQL.
Let me braindump something I have been thinking about SPARQL endpoints and quad patterns on different levels. One key problem with SPARQL endpoints is that it is so expressive you can do nasty things with it to the server, like a DoS, and there are open research question around it. One approach could be merely to limit the amount of data an engine would query.
So, potentially, every resource could be a SPARQL endpoint... Quite simply, you'd just let the engine query over the data described in the information resource, and that would be it, then, we might have so small data that it might be OK. The resource would simply be able to respond to SPARQL media types.
Next, I prefer to think about this in terms of the RDF Dataset section of the SPARQL spec. In such a situation, the information resource would be the default graph, and this could also be the assumption of a client. You could also use the RDF Dataset FROM
and FROM NAMED
to include other resources in the RDF Dataset, and those could be implemented on the client too.
IOW, we use the RDF Dataset stuff from the SPARQL spec within a pod to control what parts of the pod that are used in the query. That can be fully compatible with query answering done on the client too over e.g. TPF (where the graph name is the part without query params, barring possible URI aliasing problems), if the RDF Dataset definition is given in the Solid spec. The spec could define what is the default graph of a certain endpoint and what resources may be added to the RDF Dataset using the named graphs. Naturally, that would be limited to the pod, so that if you go beyond the pod, you would need federation mechanisms like the SERVICE
or more advanced source selection.
The next step is that each container is a SPARQL endpoint, where the resources it contains is merged to be the default graph. Again, you could do things with named graphs.
Then, the whole pod could have a SPARQL endpoint, where the default graph is the RDF merge of all the data. But then, we might run into problems because of the expressivity.
I think we should be looking into what we can do with named graphs, not only to limit what the endpoint has to consider in query evaluation, but also for partitioning. Now, since anyone can say anything about anything, we might not want to allow them to say anything on your pod, and if they did, and there's conflicting information, you might want to be able to partition the data to include only data we have verified for a certain application.
Added to user-story readme
Opening this as part of moving all user stories into issues again.
The SPARQL query can be expressed as follows regardless:
Really naive approach could try and execute this query without a seed (or source, where the source is not specific in that it refers to a domain name, and crawling won't go outside that domain, so more constrained). If given a seed, it will crawl and traverse the entire Web potentially. If not given either it looks at all IRI's in the
GET WebID <= Follow link(s) to you public/private storage <= Crawl your entire Pod, but will not follow links referencing outside your Pod (by comparing IRI domains, since the original query states 'photos in my Pod'), execute query as we crawl. Each response is parsed as triples, triples which can contribute to the query answer are kept (but one photo might only match 3 of the above 4 query patterns) - we need to keep them 'cos a subsequent resource (not a photo at all perhaps) states for that photo the 4th query pattern.
We also need to maintain a record of all IRI's visited to prevent infinite loops....