kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.1k stars 61 forks source link

Merge KDL v2 #286

Open zkat opened 2 years ago

zkat commented 2 years ago

Here it is! The long-awaited KDL v2, which is where we go ahead and make a handful of technically-breaking changes to address some corner cases we've run into over the past year while KDL has been getting implemented in a bunch of languages by various people.

I'd love to get feedback on what we have slated, and whether there's anything else we should definitely include when this goes out.

zkat commented 7 months ago

I just tagged another draft of the spec.

I think the biggest pending item for the new spec is addressing the way we do multiline string dedenting. I think we've all agreed that the way that's currently described is not what we want. So the question is do we want to do what JS does ("common indentation"), or by the indentation of the closing ".

Any strong opinions? I'm personally leaning towards the latter, since it would allow multiline strings which are themselves indented, rather than always stripping indentation even when you might want it (which JS would do).

tabatkins commented 7 months ago

Well, in JS you can keep indentation at a particular level by explicitly escaping the first whitespace you want to preserve, on one of your lines. That is,

var oneTabInIt = String.dedent`
    \tfoo
        bar
        baz
    `;

But that does still look awkward, I'll admit. I think "dedent according to the final line" is reasonable.

So, just to be clear:

Right?

IceDragon200 commented 7 months ago

I'm in favour of final quote designating the multiline spaces, it's a straightforward rule to understand and removes the guess work for the most part (most modern editors highlight indentation columns for example, making it easy to quickly scan and tell how deep an indentation will actually be when parsed)

tabatkins commented 7 months ago

Screenshot from 2024-02-06 15-30-39

zkat commented 7 months ago

@tabatkins that sounds good to me, yes. I like it being a clear syntax error.

zkat commented 7 months ago

Alright, draft 3 is out.

I think this is a good time to start recommending that existing implementations start working on support for 2.0. Any thoughts? I don't think there's anything missing at this point, except maybe we can add #371 in a bit once it's hashed out a bit (but that's just for the schema spec).

Thanks for all the work so far, everyone! We're almost at the finish line! This has been in the works for like... 2 years? 1.5 years?

IceDragon200 commented 7 months ago

Kuddle has been updated, I was just waiting for the spec to finalize before merging it into main/master, it's also what I've been using to test the spec essentially.

Do we have an issue tracking v2 implementations yet? It would give everyone a good idea of what's available for testing, maybe the community can find bugs we didn't notice and so forth

zkat commented 7 months ago

@IceDragon200 https://github.com/kdl-org/kdl/issues/372 this is the epic for tracking implementation support

zkat commented 7 months ago

I was trying to think of what sort of changes this would entail as far as consumer impact, and I was pleasantly surprised that all of our changes seem to be syntactic: the data model didn't change at all, so anyone who uses an existing KDL library should be able to use the exact same API calls in the same way, except the parsers would be different.

zkat commented 7 months ago

Alright, I've flipped this PR to "ready to review". Time to see what kinds of comments we get from the larger community :)

zkat commented 7 months ago

Alright, draft 4 is out, with #inf/#-inf/#nan support and a grammar fix!

danini-the-panini commented 7 months ago

Should the multiline_string_indented test case have an extra newline in the input and the expected?

danini-the-panini commented 7 months ago

Same for multiline_raw_string_indented, the expected file has an extra newline

danini-the-panini commented 7 months ago

Should ESCLINE be allowed around = ? e.g.

node foo=\
  "bar"
node foo\
  ="bar"
node foo=\ // lorem ipsum
  "bar"

etc.

Perhaps we should add some examples?

eilvelia commented 6 months ago

How exactly do multi-line strings interact with whitespace escapes (\)? Perhaps that should be clarified in the SPEC?

That is, are these allowed?

  "
  foo \
 bar
  baz
  "
  "
  foo
  bar\
  "
zkat commented 5 months ago

@Bannerets I've made some changes to clarify this. wdyt?

zkat commented 5 months ago

🤔 maybe it should be the other way around: it might be easier to implement to have escapes process first, and then do multiline stuff. It might also make more sense, too.

But if you're using multiline escapes in multiline strings, you're asking for trouble anyway.

eilvelia commented 5 months ago

I think interpreting escapes after dedenting seems intuitive, in the sense that the string is parsed exactly as if indentation were just not in the source code. Quickly checking multi-line strings in Elixir, it looks like the string is dedented before escapes, and, e.g., \n doesn't need any extra indentation space after it to work (Elixir also appends final newline). It may also look weird that \s can be used in place of indentation (and mixed with spaces, making the literal string not look evenly indented) in case it is interpreted first. Functions like python's textwrap.dedent can remove indentation only after the escaped string has been parsed, and it feels more like a deficiency. As far as the JavaScript proposal goes, IIRC it takes the String.raw... form of the string so escapes should not interpreted. In the end, I support escapes after dedenting.

The kdlua implementation expands escapes before dedenting, I think (link).

zkat commented 5 months ago

ahhh yes. I see your point. I'll change it back to be dedent-before-escape.

zkat commented 5 months ago

There, that's done :)

tjol commented 3 months ago

Implementing the multi-line string and whitespace escaping rules is proving quite subtle.

When processing a Multi-line String, implementations MUST resolve all whitespace escapes after dedenting the string.

This sounds simple enough: if there are newlines in the string, I check that the indentation is consistent and remove it. Then, I handle the various backslash escapes.

That should take care of illegal strings like

  "
  foo \
bar
  baz
  "

(from in the spec), and legal strings like

"
    Hello
    \
         World
    "

(which is equal to "Hello\nWorld", if I'm understanding this correctly)

This algorithm does not work for this example in the spec:

    "Hello\n\
    World"

Before considering the \ escaping the newline, this looks very much like a syntax error: there is a newline in the string, but there is no initial or final newline. I believe the formal grammar also prohibits this string.

The spec prose appears to have a solution to this conundrum: (emphasis mine)

When a Quoted or Raw String spans multiple lines with literal, non-escaped Newlines, it follows a special multi-line syntax ...

So if all newlines in the string are escaped, it is not a multi-line string? To my mind that should imply that escaped newlines are not newlines for the purposes of dedenting, and contradicts the rule that dedenting comes before backslash escapes. Overall, not very satisfying.

I would suggest:

Edit: I've added a PR - #391

tjol commented 3 months ago

I wonder if the more intuitive way for strings to work would be:

  1. remove escaped whitespace
  2. dedent multi-line string
  3. resolve other backslash escapes

This should have the same result as the #391 rule for all strings valid under that rule, but also accept more cases with escaped newlines.

zkat commented 3 months ago

@tjol sorry for the delay in responding:

I'm confused, what you're describing is definitely the intended behavior. You should still be able to write "multiline" strings by using whitespace escapes, they're just not going to be beholden to the multiline string rules, and can be easily detected by looking for the character sequence \<NL>.

But maybe allowing that is indeed too confusing and too painful to implement? So your suggestion means that whitespace escapes essentially no longer work unless you're in multiline string mode?

tjol commented 3 months ago

@zkat I think the way the spec is currently written – certainly the way I understood the 'letter of the law' while implementing – whitespace escapes basically can't escape newlines in single-line strings, yeah. But I agree that's probably not the way it should work. (Either way should be easy enough to implement)

I'll have another look at the wording and how to maybe clarify it (probably in a few days' time)

zkat commented 3 months ago

@tjol yeah it could definitely be improved. Thanks for being willing to take a look!!

tjol commented 3 months ago

Ok - alternative suggestion PR: #392