wardi / jsonlines

Documentation for the JSON Lines text file format
http://jsonlines.org
136 stars 33 forks source link

Link to (new) Wikipedia 'JSON Streaming' article #2

Open timbunce opened 10 years ago

timbunce commented 10 years ago

I started writing an issue suggesting that you link to the Line Delimited JSON article on wikipedia, and perhaps help to clean it up a little.

The more I looked at it, however, the more I realized that it wasn't as good foundation.

So I ended up writing a new wikipedia article myself: JSON Streaming

I think it's much more informative and balanced (naturally). I'd be grateful if you'd review it and, if you're happy with the content, link to it from jsonlines.org. If you spot something that needs changing or adding, please go ahead and edit the article yourself. In fact doing that anyway, even in some small way, will help the article when the Wikipedians get around to reviewing it.

wardi commented 10 years ago

@timbunce under Applications of concatenated JSON I just see a bunch of JSON libraries, what actual applications are using it?

I ask because concantenated JSON seems a little silly to me. If you want pretty-printed JSON and you use a streaming JSON parser, why not just stream a big JSON list (you shouldn't need a new format at all)

timbunce commented 10 years ago

Applications isn't a good title for that section. Got a better suggestion?

why not just stream a big JSON list

(By 'list' I assume you don't mean wrapping the objects in a JSON array [ ... ].)

Concatenated JSON isn't a new format. It's just giving a name to streaming JSON without any delimiter at all:

$ echo '{"some":"thing\n"}[42]{"may":{"include":"nested","objects":["and","arrays"]}}' | jq .
{
  "some": "thing\n"
}
[
  42
]
{
  "may": {
    "include": "nested",
    "objects": [
      "and",
      "arrays"
    ]
  }
}

Does that clarify it?

wardi commented 10 years ago

Yes, I mean wrapping the objects in a JSON array. Can a streaming JSON parser not give you one element at a time?

timbunce commented 10 years ago

I've changed Applications to Applications and Tools.

The need for the artificial [ at the start is a problem. Imagine a publish-subscribe model such as ZeroMQ there's no simple way to add the artificial [ on connection. The JSON objects will simply start streaming in.

A good streaming JSON parser ought to be able to handle concatenated JSON, or be tricked into it by resetting the parser state when each top-level object is completed.

wardi commented 10 years ago

So to me feels like a hack that's specific to certain json encoders/parsers. If you can't figure out the framing without parsing the content, it's not really framing.

wardi commented 10 years ago

Also if we're talking about streaming why do we care about having something pretty-printed? That can be handled on the receiving end if someone is interested.

If we're talking about a format suitable for editing, it needs to be a complete file anyway, so a big JSON array seems to fit better.

timbunce commented 10 years ago

The stream maybe already pretty-printed and out of the readers control. The wikipedia page is simply aiming to explain the two main forms of JSON Streaming. Is there anything you'd like to see added or changed?

wardi commented 10 years ago

@timbunce yeah, just some real world examples of people using the pretty-printed form. To me pretty-printed concatenated json seems like a really hard format to deal with.

timbunce commented 10 years ago

I don't think people would choose to use that form for data processing if they have a choice. I've dealt with cases where I've a pile of files with pretty-printed json in each. Being able to just cat *.json | jq ... is great. And cat *.json | jq -c . is sufficient to turn the json back into 'jsonlines' form.