wardi / jsonlines

Documentation for the JSON Lines text file format
http://jsonlines.org
130 stars 32 forks source link

Comment convention #23

Closed brontide closed 7 years ago

brontide commented 7 years ago

I would like to see a convention that any line starting with a # be silently skipped. We us it internally to indicate a successful conversion of files.

# This is a comment about my records
{"Foo": 1 }
{"bar": "is life", URI: "data:;base64;jksjd" }
# This is the end of my file
hugovk commented 7 years ago

The JSON Lines author, like the NDJSON author, is against comments:

I don't like blank lines (or comment lines as the ndj format has) because it makes referencing objects/lines ambiguous: What does "I want the 3rd record" mean when the 2nd is blank?

https://github.com/ndjson/ndjson-spec/issues/8

mcast commented 7 years ago

I'm on both sides of the fence here, comments are useful but then they get in the way later. Attempting a constructive answer, would an extension/convention such as

{"_comment_for_file":"This comment is about the whole file"}
{"_comment_for_next":"This is a comment about the following record"}

be a useful substitute? The burden of filtering then lands on the data consumers in related applications, instead of... everyone who consumes .jsonl files.

brontide commented 7 years ago

I'm still in favor of comments but I see both sides here. The problem with record based comments is that you are polluting the records with metadata that is not naturally part of the data stream.

Looking to other human readable formats like yaml we can see that "records" do not include comments and the computational complexity of seeking to a specific record should be no different between the various options being discussed since every character must be evaluated.

I'm not a fan of blank lines since they are neither fish nor fowl and it just feels wrong since it could hide some errors, that said it's a real edge case and wouldn't oppose it if others included it.

CONS

PRO

Honestly I think this is the key this; is jsonl strictly a m2m streaming format or meant to be computer and human comprehensible? The answer should moot this back and forth.

wardi commented 7 years ago

Parsing simplicity and unix text line-oriented tool compatibility are the strengths of this format. By limiting newlines to separate records we've already decided that readability is the lower priority.

Blank lines and comment lines just make parsing more complicated in a format that isn't going to be human-friendly to begin with.

wardi commented 7 years ago

@brontide so yes, I see jsonl as a m2m format, or a machine-to-human-familiar-with-unix-tools format :-) not in the realm of human-friendly formats like yaml (which I love)

Heck, I haven't been able to stand hand-writing even normal JSON for ages, let alone JSON without newlines and indentations. I might be strange though...

brontide commented 7 years ago

@wardi That's fair, thanks for the clarification. I will work on changing my workflow to remove comments since they are a "nice to have" and not a necessity.