wardi / jsonlines

Documentation for the JSON Lines text file format
http://jsonlines.org
136 stars 33 forks source link

Why? (explain it on website) #74

Open gabrielgortabns opened 1 year ago

gabrielgortabns commented 1 year ago

JSON:

[
   "a",
   { "b": "c" },
   [ "d" ]
]

JSONL

   "a"
   { "b": "c" }
   [ "d" ]

what is difference, except additional [ and ] in JSON?

wardi commented 1 year ago

@GabrielGorta thank you for asking, here are a few reasons:

We could add some more examples like these to the site.

gabrielgortabns commented 1 year ago

Well, this explanation should be fine to mentoy on website.

Also JSON can be formated the way that on one line is only one value from array, so practically all the reasons you mentoyed would be possible with just JSON, no need of parsing... ofcourse it's true only with the specific formating. E.g.

[
   { "a": "b" }
]

is fine, but in case of:

[
    {
        "a": "b"
    }
]

is problem. In JSONL, it's not, because it's always JSON per line, but this needs to be explained on website.

GabenGar commented 1 year ago

JSON is not a streamable format, formatting it for human consumption has nothing to do with it.

polarathene commented 1 year ago
[
   { "a": "b" }
]

is fine

Not fine.

You have a single element in that array, as soon as you have another you'd need a trailing comma. JSON5 or JSONC allows for omitting any trailing comma I think, but still would need to be array wrapped.

asciinema uses JSONL for .cast recordings for example since it can just append each new frame to a file output.

gabrielgortabns commented 1 year ago

And what is problem with trailing comma? it can simply remove that, when selecting specific line (or JSON), and you still have valid JSON. E.g. in terminal, when I do grep, I just pipe it to func that removes ending trailing comma (if there is) from string. Before it even goes to parser. Removing last character from string is not costly at all.

I don't see difference when as "separator" is \n or ,?\n, just do content.split(',?\n) and you have exact same result, as if with content.split('\n') in case of JSONL

polarathene commented 1 year ago

Yes but the reasons have been cited above already.

You can transform JSON to JSONL, with the more practical approach being through something like jq to parse the JSON array and output JSONL](https://stackoverflow.com/questions/42178636/how-to-use-jq-to-output-jsonl-one-independent-json-object-per-line). That would be more reliable than your split approach.

The advantage with JSONL is as mentioned streaming, you can output data from one program as it's ready, while the other program ingests it as it arrives, without blocking on waiting for the full document to be constructed (if one even would be "completed", such as reading a file that's appended to frequently (eg: logs).

GabenGar commented 1 year ago

I don't see difference when as "separator" is \n or ,?\n, just do content.split(',?\n) and you have exact same result, as if with content.split('\n') in case of JSONL

content.split() assumes the content value is fully parsed and loaded into memory. And every time the content of the underlying file changes, you'll have to to reparse and split it again. Since JSON requires parsing an entire document in order to assert its validity, it's at least an O(N) complexity uncacheable operation, which is not suitable for storing large collections.