w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

CONSTRUCT FRAMED #48

Open RickMoynihan opened 5 years ago

RickMoynihan commented 5 years ago

Why?

I don't really understand the ideas raised in #39, but a perhaps smaller but seemingly related problem I've encountered many times is that handling raw RDF triples can at times be awkward. Often you want them framed into objects so you can process resource objects one at a time, and know you have all the requested properties for each object. Often you don't really want to have to process the entire stream of results to group all the triples yourself. If you have a library like RDF4j or JENA you can load your triples into a Model, or memory store; but you may not always have such tools available.

Whilst databases are often already under load, they are frequently better placed to consume results for framing than their clients.

Proposed solution

It would be nice to be able to pass this burden onto the database in some circumstances, i.e. a query of something like:

CONSTRUCT FRAMED 
{ ?s ?p ?o } 
WHERE 
{ ?s ?p ?o }

would return all matches framed into resource objects... e.g. a JSONLD result stream of results grouped into resource objects: [{,,,}, {,,,}, {,,,}, {,,,}].

Such a proposal would require CONSTRUCT FRAMED queries to use a response format that can handle the framing, i.e. they would require a frame oriented format (something like JSON(LD)/XML). Technically fully beautified turtle could also fulfill the requirement, however turtle is typically read in a triple oriented manner not a resource oriented one; and the point is to guarantee consumers can process each subject one at a time.

Previous work

Considerations for backward compatibility

It requires an additive change to syntax.

dydra commented 5 years ago

how is this different from sorting the where solution set by whatever variable(s) serve as subject(s)?

VladimirAlexiev commented 5 years ago

Hi @RickMoynihan! I believe that JSONLD Frames is one of the useful ways of describing shapes (business objects), see https://github.com/w3c/EasierRDF/issues/64. I think they have fewer features than SHACL/SHEX, so I believe that frames could be generated from shapes, but not the other way around.

What I miss in your proposal is a reference to JSONLD Frames. When you say "framed" do you mean it in the same sense, or some other sense?

I believe you need to describe the shape of the result you want, as it's not obvious how far to traverse the graph and how to lay out the data. Suppose that s1 and s2 share a sub-object x (eg a nomenclature entry): should it be emitted in the frame of s1, or s2, or at top-level and only referenced from s1 and s2?

cc @azaroth42 @gkellogg @msporny

cygri commented 5 years ago

If you have a library like RDF4j or JENA [on the client,] you can load your triples into a Model, or memory store; but you may not always have such tools available.

If the client doesn't have an RDF library, then why not stick to SELECT where the client has control over the order? RDF graphs are unordered sets of triples. This issue seems to say “I want the result to be RDF but I don't want it to be RDF.”

RickMoynihan commented 5 years ago

how is this different from sorting the where solution set by whatever variable(s) serve as subject(s)?

@dydra You're right that it's very similar to doing that; but I think there's a big difference in practice. N-Triples as a format make no promise that this sorting/grouping/framing has happened, so consumer's can't rely on it; i.e. all N-Triples parsers are designed to give you data one triple at a time not a subject at a time. If you know your triples because you also wrote the query then you can certainly implement a grouping parser; that will frame resources to your application however the only framed format I know of right now is JSON-LD frames; however one could imagine many others.

The motivation for this proposal goes beyond just SPARQL and to the rest of RDF ecosystem, as I think there would be huge benefit in preserving the fact that an RDF graph is already framed, i.e. in JSON-LD it would be a serialised file with a profile of http://www.w3.org/ns/json-ld#framed. With n-triples or other traditional triple formats such as turtle or rdf/xml you'd never know this.

I feel that for this sort of thing to be maximally useful it would need to be expressable in the query itself, and not just as an Accept header on the request. CONSTRUCT FRAMED would imply a framed format as a response and mean consumers can rely on a framed interpretation. Right now the only suitable format for this is (to my knowledge) JSON-LD with the framed profile set; though there could in principle be others.

RickMoynihan commented 5 years ago

@VladimirAlexiev

What I miss in your proposal is a reference to JSONLD Frames. When you say "framed" do you mean it in the same sense, or some other sense?

Yes I co-opted frame from JSONLD Frame; but also I guess it comes from the classical KR "Frame Language" view of the world; i.e. viewing objects as collections of properties, rather than sets of triples describing objects. I'm not claiming frame-oriented is better than triple oriented, just that it also has its uses, and that RDF currently completely ignores this view with the sole exception of JSON-LD framing.

I believe you need to describe the shape of the result you want, as it's not obvious how far to traverse the graph and how to lay out the data. Suppose that s1 and s2 share a sub-object x (eg a nomenclature entry): should it be emitted in the frame of s1, or s2, or at top-level and only referenced from s1 and s2?

Well I wasn't wanting to design a detailed proposal; that would be the job of the standards committe :-) I just want to express what I believe is an endemic usability problem in the RDF world, and highlight a relatively simple potential solution. However I would imagine that framing to one level deep, would be enough 90% of the time at least to make peoples lives easier, so the following:

CONSTRUCT FRAMED { 
  ?person rdfs:label ?name ; 
                :age ?age ;
                :address ?address .

   ?address ?p ?o .

}  WHERE { 
   ?person a foaf:Person ;
                :age ?age ;
                :address ?address .
   ?address ?p ?o .
}

Would return something like (pseudo jsonld):

{
  "context": {} ,

  "graph": [{"@id": "http://bob",
                   "age" 21, 
                   "name": "Bob" ,
                   "address" "http://bobs/address" },
                  {"@id": "http://bobs/address",
                    "street": "21 Manchester Road",
                    "city": "Manchester"}
                  # ....
                  ]
}

One could imagine a much deeper integration that could map variables directly into JSON object templates. Indeed I've written a sparql-like query library that will map variables from BGPs into a JSON like tree template, though I'm not proposing anything quite that wild as it would be a much much bigger change.

RickMoynihan commented 5 years ago

This issue seems to say “I want the result to be RDF but I don't want it to be RDF.”

@cygri No, it says I want the results to be RDF and framed into objects. Is framed JSON-LD no longer RDF?

VladimirAlexiev commented 5 years ago

@RickMoynihan I think what you're asking for is already standard part of JSONLD

If you read the JSONLD spec, I'm sure you can find a profile header or something to request one or another representation of the query result. Please share your findings here.

RickMoynihan commented 5 years ago

@RickMoynihan I think what you're asking for is already standard part of JSONLD

Yes JSONLD can certainly frame things into json objects like I want. However I think the intent for this kind of representation needs to be expressed in the query; and not just as a content-negotiated header. Though currently I can ask an endpoint to return a CONSTRUCT as JSONLD by setting an accept header I can't ask it in a standardised way to group/frame the triples into objects. Nor will my RDF library know to give me framed objects instead of triples.

Hence I believe this would need to be a SPARQL feature.

gkellogg commented 5 years ago

JSON-LD frames require the client to specify a specific frame which is like a program by example for structuring a flattened JSON-LD document, which is effectively a simple quads serialization. The concept might be extended to work in a query engine to construct the triples/quads to be framed, but that would require a bit more work.

We don’t (currently) have any notion of automatics framing, such as performed by many Turtle serializers, and strictly specifying that might be challenging.

cygri commented 5 years ago

I still have trouble understanding what problem we are trying to address here. According to the issue description: “Handling raw RDF triples can at times be awkward.” Then why not use SELECT queries? No raw RDF triples, and the results can be organised by subject.

There is also Jena's JSON { ... } WHERE { ... } feature. It delivers vanilla JSON (not JSON-LD). Output is always an array of objects, with no way of nesting further objects/arrays below, but that may be enough. Example query:

JSON {
  "author": ?author, 
  "title": ?title 
}
WHERE {
  ?book purl:creator ?author .
  ?book purl:title ?title . 
  FILTER (?author = 'J.K. Rowling')
}
RickMoynihan commented 5 years ago

@gkellogg Thanks for chipping in. I only have a limited knowledge of the intricacies of JSON-LD etc, which is why I didn't in the initial issue proposal try to specify how it worked or could use it. I'd assumed that the frame would not be provided by the user, but instead be an internal detail of the database implementation. However if a frame cannot express the kind of grouping a turtle processor might do it clearly rules out JSON-LD as a suitable technology for implementing this.

@cygri: Thanks for pointing me at Jena's JSON feature I had no idea that existed. That appears to be quite close in spirit to what I am suggesting. Though I had hoped we would not lose the expression of RDF types/URIs etc and hence referenced JSONLD as a candidate for how such a thing might be implemented.

I still have trouble understanding what problem we are trying to address here. According to the issue description: “Handling raw RDF triples can at times be awkward.” Then why not use SELECT queries? No raw RDF triples, and the results can be organised by subject.

Several reasons:

  1. SELECT queries force you to name your columns, and column names cannot be predicates/uris, so you are introducing a remapping, or inventing adhoc schemes about how property keys related to their values.
  2. SELECT queries are tables not RDF and if you want a one resource per row representation; you can't mix resource types in a single query with heterogenous properties; or effectively dropping the one object per row requirement and normalising it all to s p o columns.

The use cases I had in mind are having lightweight transformations where you want to pull some data out of the database and perform some simple processing without having to use a Model/MemoryStore. In these situations I still want the RDF types and their properties/subjects, and still want to have a representation that describes an RDF graph that could in principle be loaded back into the store.

Anyway the fact I'm arguing so much over this one, means it's clearly controversial, or deemed not worth the effort. And of all the 1.2 features it's not the one I'd like to see most. I'd much rather have #31 #49 and I definitely find handling labels and priorities to be a pain #13 (but I can't of a realistic implementation).

jindrichmynarz commented 5 years ago

I run a CONSTRUCT query followed by local JSON-LD Framing to achieve what you describe (e.g., curl and jsonld is a good combo).

You mention that you'd like to "pass this burden onto the database". This would make sense if the database was better suited for framing than your client application, so that various performance savings (e.g., streaming response, lower bandwidth) can be achieved. What kinds of savings do you think can be made? I believe JSON-LD requires data in memory and the performance characteristics within a database and in a client application.

Regarding usability improvements, apart from the "burden" passed onto the database you'd have to pass more information about how should the framing be done; i.e. communication overhead compromising any usability improvements.

RickMoynihan commented 5 years ago

What kinds of savings do you think can be made? I believe JSON-LD requires data in memory and the performance characteristics within a database and in a client application.

Well firstly the database often runs with more hardware resources than your app. Admitedly often you also want to take load off your database. As always there are tradeoffs.

Secondly a database can often sort and group efficiently it may also have mechanisms for terminating resource hungry operations, through timeouts, excessive memory usage for a single query etc. Also spilling to disk etc.

Regarding usability improvements, apart from the "burden" passed onto the database you'd have to pass more information about how should the framing be done; i.e. communication overhead compromising any usability improvements.

I would've hoped it would always just group on s, and then group on p within each resource; like a beautifying turtle serializer might. Clearly we would not want to pass framing information in or or out of band, unless we were to extend construct with much more complex templating features.

VladimirAlexiev commented 5 years ago

@rickmoynihan I think you missed part of my reply. compacted and flattened do what you want. If you don't need a different shape, you don't need a Frame.

JSONLD by setting an accept header I can't ask it in a standardised way to group/frame the triples into objects

I think that's false, have you checked the jsonld spec? https://w3c.github.io/json-ld-syntax/#example-142-http-request-with-profile-requesting-a-compacted-document

cygri commented 5 years ago

@RickMoynihan What effect should CONSTRUCT FRAMED have if it is used with a result format other than JSON-LD? Say, Turtle and N-Triples. Would there be use cases for CONSTRUCT FRAMED with these formats?

Another tangent: Did you look at all into querying RDF with GraphQL? Here's a survey by @rubensworks. I don't know what the state of the art is with regard to updating RDF via GraphQL, but in theory, once you have a mapping between the two models established, it should be possible to send a query, get back vanilla JSON, modify the JSON, and send it back where it leads to RDF updates.

dydra commented 5 years ago

with respect to "... a standardised way to group/frame...", a significant phrase in that passage in the syntax document is

... using an application-specific default context

the json-ld 1.1 process deferred the issue, how to specify that in a request.

gkellogg commented 5 years ago

with respect to "... a standardised way to group/frame...", a significant phrase in that passage in the syntax document is

... using an application-specific default context

the json-ld 1.1 process deferred the issue, how to specify that in a request.

Yes, when requesting a framed result, it’s up to the service to determine the framing. We haven’t specified a mechanism for a client to specify a frame, but would likely use a Link and/or Profile header. Also, outside our scope to specif a service behavior.

RickMoynihan commented 5 years ago

Another tangent: Did you look at all into querying RDF with GraphQL?

@cygri Yes, I spent a lot of time with colleagues designing building and experimenting with that approach with colleagues. However it's a hard circle to square given GraphQL's limited type system, lack of namespaces, and that graphql schemas need to be generated prior to query time (or you break conformance with GraphQL as Stardog does by dropping schemas altogether -- which in my mind removes a lot of the point in doing it).

@RickMoynihan What effect should CONSTRUCT FRAMED have if it is used with a result format other than JSON-LD? Say, Turtle and N-Triples. Would there be use cases for CONSTRUCT FRAMED with these formats?

It's a good question, there would be iff consumers, libraries and API's know by the query type to group resources. However I was imagining a framed RDF format (or JSONLD), which admitedly might be beyond the scope of this CG would be the most useful.

RickMoynihan commented 5 years ago

@VladimirAlexiev

@RickMoynihan I think you missed part of my reply. compacted and flattened do what you want. If you don't need a different shape, you don't need a Frame.

Thanks. Yes that looks sufficient.

JSONLD by setting an accept header I can't ask it in a standardised way to group/frame the triples into objects

I think that's false, have you checked the jsonld spec? https://w3c.github.io/json-ld-syntax/#example-142-http-request-with-profile-requesting-a-compacted-document

Ok, but we're talking about the SPARQL protocol here right? I'd be surprised if any endpoint honored that. Mandating that endpoints did honor that in SPARQL 1.2 might be sufficient for an implementation of this, though I can't shake the feeling that this needs to be represented in the query itself, as otherwise memory/in-process stores etc won't be able to honor it in a standard way.

dydra commented 5 years ago

... but we're talking about the SPARQL protocol here right?

yes, but if an endpoint accepts the response media type, it is bound to follow its definition.

RickMoynihan commented 5 years ago

@dydra except honoring the profile parameter is not required:

The profile parameter MAY be used by clients to express their preferences in the content negotiation process.

https://w3c.github.io/json-ld-syntax/#iana-considerations

dydra commented 5 years ago

which is followed by

If the profile parameter is given, a server SHOULD return a document that honors the profiles in the list which are recognized by the server.

cygri commented 5 years ago

I can't shake the feeling that this needs to be represented in the query itself, as otherwise memory/in-process stores etc won't be able to honor it in a standard way.

You earlier said the main use case for this is clients that don't include an RDF library. If you have an entire SPARQL engine in-process, then surely there is a straightforward way to access the CONSTRUCT result as, for example, a Jena model, and serialise it to, for example, compacted JSON-LD.

Yes, I spent a lot of time with colleagues designing building and experimenting with that approach with colleagues.

Oooh, that looks very cool. Thanks for the pointer. (At TopQuadrant we have SHACL models for all our data, and use annotations on the SHAQL constructs to define the GraphQL schema. This works quite well for our use cases.)

RickMoynihan commented 5 years ago

You earlier said the main use case for this is clients that don't include an RDF library. If you have an entire SPARQL engine in-process, then surely there is a straightforward way to access the CONSTRUCT result as, for example, a Jena model, and serialise it to, for example, compacted JSON-LD.

Yes, though I don't believe it's contradictory to expect a feature which I'd originally proposed might warrant a SPARQL 1.2 syntax, would also work with the same semantics, when you were operating in process. If only to honor the abstraction.

<OFFTOPIC>

Yes, I spent a lot of time with colleagues designing building and experimenting with that approach with colleagues.

Oooh, that looks very cool. Thanks for the pointer. (At TopQuadrant we have SHACL models for all our data, and use annotations on the SHAQL constructs to define the GraphQL schema. This works quite well for our use cases.)

Yes, we've been thinking about adopting that same approach for a long time. However when we started SHACL had only just been finalised and IIRC there were very few implementations at that time which we could easily leverage. That project was largely just an experiment, though I'd like to rebuild in a much more robust manner. </OFFTOPIC>

RickMoynihan commented 5 years ago

which is followed by

If the profile parameter is given, a server SHOULD return a document that honors the profiles in the list which are recognized by the server.

@dydra ... but this is a largely moot point as JSONLD isn't mandated as a response format for constructs by SPARQL 1.1 anyway.

I suppose this issue could perhaps be fixed enough for my purposes if SPARQL 1.2 implementers were required to implement JSONLD, and they were required to group the objects such that all subjects and properties were normalised into the tree. Though looking some more at the specs for compacting/flattening documents I'm not convinced implementers are required to group like this... though the playground examples appear to.

dydra commented 5 years ago

if this is a moot point, then i have lost track of what you intend this feature to accomplish.

RickMoynihan commented 5 years ago

@dydra forgive me if I'm misunderstanding you, but I read your point as suggesting the feature request is unnecessary because the JSONLD spec says if you support JSONLD an implementer should support profiles, and that a profile such as compact/flattened would let a consumer ask for the results to be grouped/framed into resource objects as I'd like.

I suggested this point on profiles is somewhat moot, because JSONLD isn't a required response format in SPARQL 1.1, therefore even if a JSONLD profile would solve this issue; framing responses in the manner I suggested is not standardised as I thought you were suggesting.

I do however agree with you that a combination of JSONLD and a compact/flattened profile may be what I'm after. However though the JSON in the playground looks similar to what I'd like I've not had time to digest the standard documents to confirm that compaction groups jsonld resource maps together by subject/predicate id.

dydra commented 5 years ago

if a combination of JSONLD and a compact/flattened profile does provide this capability, then would this be a matter of a protocol change rather than a language change?

gkellogg commented 5 years ago

Note that JSON-LD compact and framed profiles might be treated equivalently by a service in the absence of any explicit context or frame. The JSON-LD CG May be the group to work on protocol mechanisms to specify the context or frame along with the request, but some group such as a successor to the Linked Data Platform might be best for creating normative requirements.

The fact that JSON-LD is not a requirement for a CONSTRUCT representation likely is due to the fact that the SPARQL 1.1 spec predates JSON-LD 1.0.

Personally, I’d like to see better integration of JSON-LD in SPARQL, perhaps with a frame-like representation within something like CONSTRUCT.

(BTW, my attempt to join this CG is held up due to affiliation issues, which I should resolve in a week or so).

RickMoynihan commented 5 years ago

Personally, I’d like to see better integration of JSON-LD in SPARQL, perhaps with a frame-like representation within something like CONSTRUCT.

That's good to know @gkellogg. Were you thinking something similar to Jena's [JSON template extension] (https://jena.apache.org/documentation/query/generate-json-from-sparql.html#query-syntax) that Richard shared?

I had thought such a thing would be useful too; and I've experimented implementing a similar mechanism to map triples into arbitrary clojure datastructures; so I understand the desire for such a thing. It would be much more powerful than what I'm suggesting, though I felt it might be much too big a change for SPARQL 1.2; so offered this as a partial solution.

VladimirAlexiev commented 5 years ago

@gkellogg

notion of automatics framing, such as performed by many Turtle serializers

That's simple, they do Concise Bounded Description: collect all statements of a Subject, and embed blank nodes. Most DESCRIBE implementations do the same. Compare #39 which asks for more sophisticated ways to define a DESCRIBE response.

afs commented 5 years ago

@RickMoynihan wrote:

Thanks for pointing me at Jena's JSON feature I had no idea that existed. That appears to be quite close in spirit to what I am suggesting. Though I had hoped we would not lose the expression of RDF types/URIs etc and hence referenced JSONLD as a candidate for how such a thing might be implemented.

Yes, the preservation of RDF details has come up before. Similarly in GraphQL, sometimes getting the details without losing details would make the client's work easier and other times its "just give me JSON".

There is a converse issue as well, the SPARQL JSON results with "plain JSON", not the RDF terms in all their details c.f. the CSV format but for JSON.

Obviously not to require full JSON-LD processing, but some integration of JSON-LD would be good, which could be "practice and experience" note, if the machinery already exists and what is needed is roll-out.