ndjson / ndjson-spec

Specification
680 stars 29 forks source link

Relation to JSON Text Sequences standard #25

Closed letmaik closed 2 years ago

letmaik commented 9 years ago

What is the relation to the JSON Text Sequences RFC? Seems that it is being picked up by some projects, e.g. GeoJSON.

Zorgatone commented 7 years ago

Didn't know about that standard, seems a lot of people are trying to define a standard for the same purpose: json objects over multiple lines. I've also seen application/x-json-stream and application/x-jsonlines. We should stick to only one of these

Zorgatone commented 7 years ago

I haven't looked into any parser, or tried to implement any. I'm wondering, though, how you'd handle json objects/arrays/strings containing the newline character. Certainly you can't roughly read line-by line and feed each line to a normal JSON parser, that'll break

EDIT: nevermind, I think they just handle minified JSON (ie. without newlines in between), and in strings they're escaped

Zorgatone commented 7 years ago

There's also this one http://jsonlines.org/

kyeotic commented 5 years ago

I see JSON Text Sequences as having one major advantage and one minor disadvantages.

The advantage is that you can safely stream JSON that has values containing newlines, which NDJSON cannot do. This is major because string values containing newlines are valid JSON, and fairly common on the web, so NDJSON clients need to first replace all the newlines with some pre-arranged character before sending and then replace again when parsing.

The disadvantage is that its harder for a human to read the content. This is minor because its still possible for a human to read, and also because I don't think the wire-format needs to match the storage format.

I don't see why anyone would pick NDJSON over JSON text sequences given these.

clue commented 5 years ago

This is major because string values containing newlines are valid JSON.

According to RFC 7159 control characters (which includes newlines) need to be escaped, i.e. a valid string would be "hello\nworld" whereas a literal newline character in a string would be invalid JSON.

In other words, strings containing escaped newlines are supported in JSON just like they're supported in NDJSON and JSON Text Sequences.

On top of this, JSON also allow insignificant whitespace between any structural elements (this is commonly referred to as "pretty printing"). While JSON Text Sequences allows actual newlines here as well, this is not allowed by NDJSON.

I understand where you're coming from and agree that this may not cover all possible use cases. That being said, from my personal, professional experience I would argue that this is actually much less of problem that it might appear at first. Many of the applications where a streaming format like NDJSON makes sense use JSON values without any insignificant whitespace to reduce bandwidth.

I've been working with implementations of NDJSON and JSON Text Sequences for PHP and also did a quick comparison in my blog: https://www.lueck.tv/2018/introducing-reactphp-ndjson. My main take away is that they're somewhat interchangeable for the most part. JSON Text Sequences has the benefit of being a standard, but who knows if NDJSON will catch up (#21)…

kyeotic commented 5 years ago

The front page of the repo says

The JSON texts MUST NOT contain newlines or carriage returns.

which contradicts what you said.

clue commented 5 years ago

@tyrsius A "JSON text" is the serialized value, for example an object {"name":"Alice"}, but also just primitive values like 42, null and strings like "Bob". See also https://tools.ietf.org/html/rfc7159#section-2 for more details.

You're right that the NDJSON spec says that each JSON text MUST NOT contain newlines and carriage returns. However, JSON already mandates that newlines in strings MUST be escaped like the previous example ("hello\nworld"). This means that JSON can support newlines in strings just fine, they just need to be escaped. Likewise, NDJSON also supports escaped newlines in string values.

What NDJSON does not allow is insignificant whitespace containing newlines. For example, the following JSON text is valid JSON, but not valid NDJSON because it spans multiple lines:

{
    "name": "Alice"
}

To re-iterate, the following is valid NDJSON:

{"name":"Alice","comment":"hello\nworld"}
{"name":"Bob","comment":"hello\nagain"}
kyeotic commented 5 years ago

I guess I'm not clear on what the distinction is. The escape sequence is how an n becomes a "newline" in any string in JavaScript/JSON. If you have a string that contains newlines those newlines are \n in the string, The fact that they might also be in a JSON VALUE can only be determined if your parse the string as JSON first.

How could an NDJSON streaming parser tell the difference between a "newline that starts a new line" and a "newline in a JSON string _value"?

In other words, how could an NDJSON client tell the difference between these two? 1.

{"name":"Alice","comment":"hello\nworld"}
{"name":"Bob","comment":"hello\nagain"}

2.

{"name":"Alice","comment":"hello
world"}
{"name":"Bob","comment":"hello
again"}
millette commented 5 years ago

@tyrsius

{"name":"Alice","comment":"hello
world"}

is invalid json and invalid ndjson. The newline must be escaped (written out as \n).

Whereas

{"name":"Alice",
"comment":"helloworld"}

is valid json, but invalid ndjson.

Hope that helps.