Exposition format: Specify whitespace handling in text format

beorn7 commented 8 years ago

So far, the text format allows whitespace liberally. This was done to make generation of metrics in text format easy. The assumption was that there are way more generator implementations than parser implementations.

However, some implementers of parsers requested to have stricter rules for whitespace to make parser implementation easier.

We need to come up with a decision before carving the format into stone.

beorn7 commented 8 years ago

Just FTR: My personal opinion is still that we should make generation as easy as possible and keep being liberal with whitespace. I don't even see how parser implementation gets so much more complicated with liberal whitespace.

RichiH commented 8 years ago

I agree with @beorn7. I suggest exporters/libraries SHOULD NOT print more than one whitespace behind each other, while consumers MUST accept multiples and MAY print a warning/info message.

brian-brazil commented 8 years ago

Allowing whitespace e.g. between the metric name and the { requires more FSM states and transitions, and in no way makes it easier to generate the text format. Similarly after the {, before the =, after the = etc.

brian-brazil commented 7 years ago

By contrast, allowing a trailing comma in the labels and empty {} both make writing generators easier as both avoid a special case.

RichiH commented 7 years ago

This is actually a good point. The failure to allow a trailing comma is my main gripe with JSON.

brian-brazil commented 7 years ago

We already expanded the format to allow the trailing comma.

beorn7 commented 7 years ago

One of the main reasons to introduce the text format was to make it easy to generate it on the fly with a shell script or other not very powerful tools to generate a text string. By being tolerant with whitespace, the user, who might not be very familiar with the format, doesn't have to waste brain cycles to know where whitespace is required or disallowed. Also, the tool might insist on adding whitespace between components (e.g. fmt.Println() in Go).

brian-brazil commented 7 years ago

I've not seen any client library or other custom exposition format put extra whitespace in the places we allow it, so empirically it doesn't help generation. On the other hand this doubles the complexity of a parser.

I've yet to come across a language that doesn't offer a way to control whitespace. Almost everything modern has a form of sprintf, and the rest have a way to do it.

If this is to be an IEFT standard there will be far more than the existing 2 parser implementations, and the chances of implementors not implementing this is high as noone actually uses this whitespace. It's better to have a smaller simpler standard.

beorn7 commented 7 years ago

I've not seen any client library or other custom exposition format put extra whitespace in the places we allow it, so empirically it doesn't help generation.

I'm not talking about client libraries. Obviously, it's easy to teach libraries to follow restricted rules. I'm talking about ad-hoc generation of lines of text format.

On the other hand this doubles the complexity of a parser.

That's a bold claim. The Go implementation treats whitespace correctly. I have my doubts it would become half as complex if we removed that feature from it.

But even if it's true, I estimate about ten parser implementations to exist worldwide, but there will be thousands of occasions where users create text-format lines ad hoc. So I still believe it's worth it.

It's better to have a smaller simpler standard.

Meticulously specifying where which whitespace is allowed, required, or disallowed makes the standard more complicated. Also, most languages allow arbitrary whitespace between components. We don't have to go back to FORTRAN77 for the sake of simplifying parser implementation. It will make it more difficult to understand the standard.

brian-brazil commented 7 years ago

I'm talking about ad-hoc generation of lines of text format.

I included that in my statement. I've seen none of them that had extra whitespace.

That's a bold claim.

It roughly doubles the number of parser states. It'd halve the complexity of the Python parser not to have to support extra whitespace.

Meticulously specifying where which whitespace is allowed, required, or disallowed makes the standard more complicated.

I disagree, we need to specify that anyway and anywhere something is optional makes it more complicated and less likely to be followed.

Saying that the format is just

name{label_name="value"} 0
name{label_name="value",} 0
name{label_name="value",name2="value2"} 0
name{,} 0
name{} 0
name 0
name 0 0

and not

name {label_name="value",name2="value2"} 0
name{ label_name="value",name2="value2"} 0
name{label_name ="value",name2="value2"} 0
name{label_name= "value",name2="value2"} 0
name{label_name="value" ,name2="value2"} 0
name{label_name="value", name2="value2"} 0
name{label_name="value",name2="value2" } 0
etc.

makes it simpler.

beorn7 commented 7 years ago

I've seen none of them that had extra whitespace.

I have seen many. Especially blanks after the comma between label pairs. In fact, every machine at SoundCloud has at least one file for the text file exporter with that kind of whitespace usage.

It'd halve the complexity of the Python parser not to have to support extra whitespace.

As said, even if that's the case, I'd vote for allowing whitespace for the reasons already stated.

About the specification: Specifications don't happen by example. Allowing whitespace essentially adds one sentence to the spec. In practice, it won't make it longer because, for the non-whiteespace spec, you probably want to explicitly state that no spurious whitespace is allowed (as it is so common to allow it).

brian-brazil commented 7 years ago

Based on other RFCs I'd expect the spec to be via EBNF, with examples (as EBNF while precise is hard for humans to interpret).

beorn7 commented 7 years ago

Specifying optional whitespace in EBNF is actually quite verbose. So for shortness of EBNF, I'd even buy the argument that optional whitespace bloats the spec.

However, as you say, EBNF is not what humans will see as relevant for the perceived complexity of the spec. Humans will perceive it as complex if they have to unlearn that whitespace between tokens is fine.

brian-brazil commented 7 years ago

I'm not following your argument here. The EBNF form is a "pure" mathematical form of the spec, so if that's bloated then the spec is bloated as that's what implementors actually have to implement to be correct.

I'd be okay with a space after the commas.

Humans will perceive it as complex if they have to unlearn that whitespace between tokens is fine.

I don't think we should presume anything of people reading the spec.

beorn7 commented 7 years ago

The EBNF form is a "pure" mathematical form of the spec, so if that's bloated then the spec is bloated as that's what implementors actually have to implement to be correct.

The fact that it is "pure" doesn't mean it's the shortest possible description.

But I anyway don't think it is relevant to come up with a spec that is describable in the theoretically shortest form.

I think it is relevant to come up with a spec that is easy to understand and memorize for humans.

I'd be okay with a space after the commas.

In which case we would introduce a special case, which would indeed "bloat" the spec in both senses, the mathematical "pure" one and the one concerning the cognitive load for humans.

I don't think we should presume anything of people reading the spec.

Then we will have a hard time formulating a spec that is readable by humans.

But we are losing ourselves in fairly irrelevant details. A slightly more or less complex description of the format is hardly of any relevance for the actual decision we have to come up with here.

grobie commented 7 years ago

In general I believe we should prioritize ease of creating a text format representation over the ease of writing a parser. A parser is written once and while it might be annoying having to consume extra whitespace, I don't think it's a big deal. Ensuring the correctness of a parser is easy by providing a reference file implementors can check against.

I have seen people putting whitespaces anywhere inside of curly braces. I expect peole to do the same in ad-hoc generated outputs.

Though, the only thing I actually feel strongly about is allowing a a single optional whitespace after a comma.

juliusv commented 7 years ago

Leaning towards allowing whitespace. Only not a big fan of allowing it between the metric name and {...}, but then again, PromQL also allows that...

prometheus / docs

Exposition format: Specify whitespace handling in text format #543