Closed brontide closed 7 years ago
The JSON Lines author, like the NDJSON author, is against comments:
I don't like blank lines (or comment lines as the ndj format has) because it makes referencing objects/lines ambiguous: What does "I want the 3rd record" mean when the 2nd is blank?
I'm on both sides of the fence here, comments are useful but then they get in the way later. Attempting a constructive answer, would an extension/convention such as
{"_comment_for_file":"This comment is about the whole file"}
{"_comment_for_next":"This is a comment about the following record"}
be a useful substitute? The burden of filtering then lands on the data consumers in related applications, instead of... everyone who consumes .jsonl
files.
I'm still in favor of comments but I see both sides here. The problem with record based comments is that you are polluting the records with metadata that is not naturally part of the data stream.
Looking to other human readable formats like yaml we can see that "records" do not include comments and the computational complexity of seeking to a specific record should be no different between the various options being discussed since every character must be evaluated.
I'm not a fan of blank lines since they are neither fish nor fowl and it just feels wrong since it could hide some errors, that said it's a real edge case and wouldn't oppose it if others included it.
CONS
PRO
Honestly I think this is the key this; is jsonl strictly a m2m streaming format or meant to be computer and human comprehensible? The answer should moot this back and forth.
Parsing simplicity and unix text line-oriented tool compatibility are the strengths of this format. By limiting newlines to separate records we've already decided that readability is the lower priority.
Blank lines and comment lines just make parsing more complicated in a format that isn't going to be human-friendly to begin with.
@brontide so yes, I see jsonl as a m2m format, or a machine-to-human-familiar-with-unix-tools format :-) not in the realm of human-friendly formats like yaml (which I love)
Heck, I haven't been able to stand hand-writing even normal JSON for ages, let alone JSON without newlines and indentations. I might be strange though...
@wardi That's fair, thanks for the clarification. I will work on changing my workflow to remove comments since they are a "nice to have" and not a necessity.
I would like to see a convention that any line starting with a
#
be silently skipped. We us it internally to indicate a successful conversion of files.