Open VladimirAlexiev opened 3 years ago
@VladimirAlexiev , i found some examples in the specs but didn't find what you were referring to in the "Sample data" link above.
I think streaming JSON would be an excellent tool for long-running SPARQL results and line-oriented is a nice benefit. I guess this is a small step from current JSON results as they already require newlines to be escaped, right?
NDJSON is apparently also known as all of LDJSON, Line_Delimited_JSON, JSON_Lines, JSON_Streaming, JSONL, ndjson, NDJSON, and Newline_Delimited_JSON -- so this new thing could even be LD-JSON-LD
!
Except that JSON-L (or JSONL) is definitely different from NDJSON... And I imagine there are other issues hiding behind the not-quite-synonym list above.
What is the (anticipated?) relationship between ND-JSON-LD (or NDJSON-LD) and JSON-LD (and 1.0, 1.1, etc.)?
Both JSON Lines and Newline Delimited JSON say they're also known by the other name, but as noted above these are different creatures. It's going to be necessary very quickly to clearly define which you're working with (and why not the other), as well as what may happen if the streams are crossed.
How and why is "Newline Delimited JSON-LD" (or is it "Linked Data in ND-JSON"?) related to the 1.2 update of SPARQL, which is the focus of this github project?
It seems to me that ND-JSON-LD should be a distinct project, maybe associated with JSON-LD given their apparent close cousin relationship.
On Media Type...
x-
Media Types are generally frowned on these days, for good reason. Which the NDJSON folk know, and haven't done much about (https://github.com/ndjson/ndjson-spec/issues/19, https://github.com/ndjson/ndjson-spec/issues/21).
Media Types with Multiple Suffixes is heading toward RFC status, and application/ld+json
already exists, so you might consider application/nd+ld+json
, possibly with a synonymous application/ld+nd+json
(which would need the apparently stagnant NDJSON project to change from application/x-ndjson
to application/nd+json
)
If you don't want to pin hopes on Media Types with Multiple Suffixes, you might also consider application/ld+ndjson
, and again pushing the NDJSON project to change from application/x-ndjson
to application/ndjson
...
Or leave the NDJSON project fallow as it stands, and consider application/ld+x-ndjson
, which at least follows the general rules of Media Types, and parallels the existing application/ld+json
.
This feels like a lot of frayed ends in search of a knot. That knot may be worthwhile, but I think it should be distinct from SPARQL 1.2.
Won't it be application/sparql-results+x-ndjson
for SELECT results and application/ld+x-ndjson
for CONSTRUCT/DESCRIBE?
From JSON-LD, application/ld+...
is about RDF graphs and datasets, and ...+json
the concrete syntax choice. (c.f. rdf+xml
).
It would seem that the appropriate place for this effort would be the JSON-LD CG (AKA the JSON for Linking Data Community Group), although the JSON-LD WG remains as a maintenance group.
Also, note that the WG published the Streaming JSON-LD note, which addresses the need for a streaming serialization format, but in this case by imposing an order object entries in the line serialization, although it is not a line format, per se.
At first glance, the NDJSON-LD would seem to follow well given an out-of-bound specified context, such as via Link header. That would make it much the same as parsing an outer object containing @context
and the values of @graph
. Going beyond, an extension for supporting an @context
at the top level, either as a URL, or a one-line object, would be straight-forward. Nothing would prevent an individual NDJSON line from including @context
, either, unless there is some limitation on line length I didn't notice.
@ericprud The sample data we have cited in our jira looks like this
{"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", "type": "MonetaryGrant", "id": "sg:grant.6616389",...\n
{"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", "type": "MonetaryGrant", "id": "sg:grant.6616214",...\n ...
It's probably here http://scigraph.downloads.uberresearch.com/archives/current/grants.tar.gz
Right now we are considering NDJSON-LD for input,
but you make a good point that a streaming sparql-results-json
for SELECT output would also be useful.
In fact, CONSTRUCT output as NDJSON-LD is non trivial because how would it know which triples to put on each line? How would it know which is the "main loop" of the query, or the "primary key" so to speak?
@TallTed thanks for the pointers to MIME developments!
@gkellogg thanks for the pointer to Streaming jsonld!
Pinging @wouterbeek here regarding NDJSON-LD, as he suggested it a while back here https://github.com/rubensworks/jsonld-streaming-parser.js/issues/64
There's a longish discussion of media subtypes containing '+' on media-types@ietf.org.
(I don't actually think nd+json
is viable because people assume that +json
means the resource matchs 4627, but folks can always relax their standards if they don't mind breaking some stuff.)
sparql-results+json
is streaming if the fields are in the right order ("head" before "results").
Content-length:
and a line format, means there can be silent truncation of results.Content-Length
interacts with connection management with some DOS potential by badly behaved clients.These aren't reasons not to do it - they are things that should be noted in any design. Inside the enterprise is different environment to the open web.
Just to note a real-world use case for newline delimited JSON-LD. For one application we developed, we index suitably framed JSON-LD documents in Elasticsearch where the documents are imported to Elasticsearch as NDJSON. That process uses a Jena model to gather RDF data from various sources (blackboard design pattern), then extracts and frames a sub-graph for resources of a given type.
Whilst it would be nice to be able to get some NDJSON-LD serialization as the result of a SPARQL query directly, I think it would be necessary to have some way to indicate a JSON-LD frame (rather than just a context as @gkellogg suggested) in order to guarantee consistent nesting/embedding in the JSON object structure.
Arguably for our usage the JSON-LD frame IS the query, a SPARQL query is not even needed.
- Streaming a line format, used without the
Content-length:
and a line format, means there can be silent truncation of results.
@afs -- I would think that adding a specific termination marker to the syntax would avoid silent truncation without Content-length:
-- and including the net line count in the termination marker (at which point, it should be trivially known) would prevent errors from missing lines, though it wouldn't give any good way to recover from such, other than repeating the request and running a diff on the two streams if the second also had some drop-outs...
Content-Length
is understood by HTTP/1.1 libraries and is used by them to reuse connections.
A trailer as protocol-level termination and including end-transfer information would be a good thing . It does not completely replace Content-Length
though.
There is of course HTTP/2 - new protocol work ought to be an abstract design that exploits HTTP/2 features, can also be targeted at other transfer layers, for example, streaming gRPC. HTTP/1.1 may not be able to expose all of that design though improvements like early termination can be fitted.
@jaw111 thanks for the input!
necessary to have some way to indicate a JSON-LD frame
Yes, unless you have #39, #48, #73, #128 :-)
the JSON-LD frame IS the query
I think you're talking GraphQL here :-)
I think you're talking GraphQL here :-)
I was not able to come to terms with GraphQL-LD, still prefer SPARQL.
There is definitely some overlap between JSON-LD frames and GraphQL.
Just a note that @butaloto is working to upgrade our NDJSONLD implementation https://github.com/eclipse-rdf4j/rdf4j/issues/2840 to JSONLD 1.1
Why?
Newline-delimited JSON (line-oriented JSON) is often used in preference of JSON because it is streamable and can be processed with line-oriented tools (eg grep)
Previous work
Proposed solution
application/x-ld+ndjson
(derived from the existing MIME type for JSON-LDapplication/ld+json
and the MIME type of Newline Delimited JSONapplication/x-ndjson
)Considerations for backward compatibility
None?