google / go-jsonnet

Apache License 2.0
1.63k stars 235 forks source link

inconsistent/invalid parsing behavior on block strings #680

Closed nicklan closed 1 year ago

nicklan commented 1 year ago

the parser seems to be overly lenient about what W's size can be for block strings (reference: https://jsonnet.org/ref/spec.html)

both of these files parse successfully:

File 1:

|||

   test
  |||,
   ||| x
|||

File 2:

|||

   test
   |||,
   ||| x
|||

but at most File 2 should.

It's a bit hard to see what's going on, but in each case we start with ||| followed by a newline and then a line with two spaces, and then a line with 3 spaces and test.

About starting a text block, the spec says: "The next non-blank line must be prefixed with some non-zero length whitespace W." This has two possible interpretations: a. W is 3 if "non-blank" means "doesn't contain only whitespace (this is how I would read it) b. W is 2 if "non-blank" means "isn't only a newline"

Either way, File 1 should never parse correctly.

If a. is the correct interpretation then only File 2 should parse since in that case line 4 still has a 3 space indent and doesn't end the block.

If b. is correct, then neither should parse. File 1 should end the block on line 5 where it starts with 3 spaces and then |||, and File 2 should end the block on line 4 where it starts with 3 spaces.

sparkprime commented 1 year ago

"blank line" was intended to mean "line with no characters at all" so the spec is ambiguous there.

sparkprime commented 1 year ago

So the reason these are working is that the indentation level is chosen as w = 2 (actually since we have to handle a mix of tabs and spaces it's expressed as w = " ") and then that's stripped off every line until you see a ||| without w on the front.

sparkprime commented 1 year ago

I'll write your examples again but with . for the space character and not . for arbitrary text:

|||
..
.. test
..|||,
.. ||| x
|||
|||
..
.. test
.. |||,
.. ||| x
|||
sparkprime commented 1 year ago

What's bad about this is the number of spaces on the first line (even though the line only has spaces on it) will change the amount of indentation that is stripped off of the remaining lines. But this is what meaningful whitespace is all about, I'm not sure anything can be done about that.

nicklan commented 1 year ago

Thanks for the reply.

(edit: previous text here was incorrect, removing)

nicklan commented 1 year ago

Ohh, I think actually I understand now. The spec is really difficult to read here. But basically because those lines begin with .., they don't end the block.

The block only ends if it has LESS or DIFFERENT whitespace at the start. Is that correct?

sparkprime commented 1 year ago

Yeah that's right. I will update the spec and it's probably worth having some examples in the language reference too.

nicklan commented 1 year ago

Got it, thanks. Feel free to close this then, or maybe only after you update the spec ;)

nicklan commented 1 year ago

One more thing to note, the ending behavior should probably also specify "ends at the first NON-BLANK subsequent line that does not begin with W", since adding "blank" lines in the middle of a block seems to work fine. (or that's a bug if not intended)

sparkprime commented 1 year ago

True, and you can also have blank lines at the end I think (just before the final |||)

sbarzowski commented 1 year ago

I suppose we could clarify the spec here.

sparkprime commented 1 year ago

https://github.com/google/jsonnet/pull/1061