Open zkat opened 9 months ago
please consider pinging tree-sitter-kdl. thanks
What's the file marker that differentiates v1 from v2? File extension .kdl2
? Inner top-line string kdl v=2
? Something else?
There's no file marker, but the data model is exactly the same--it should be very easy to heuristically detect whether you're looking at 1.0 or 2.0 and react accordingly if you're working on a dual parser, and the resulting data will be the same in both cases.
What is the heuristic in a non-code syntax file to tell apart a syntax error of #invalid_node_in_a_v2_file
vs a #valid_node_in_a_v1_file
? I've only seen extensions and first line matches
I'd say that's an oversight of the v2 version, why not have something like
/-v 2
or
/-kdl v=2
you detect which one parses and go from there, basically :)
So, if you run into #invalid_node_in_a_v2_file
, you set your parser to 1.0 mode, and if anything else in the file disagrees, it's a syntax error. Ditto the other way around.
Personally, I'm just going to support 2.0 for parsers I maintain going forward instead of trying to support both versions. Folks can use older versions of the parser to use 1.0 until they're ready to migrate.
When discussing it, we came to the conclusion that the burden of always having to tag your 2.0 files wasn't worth it when KDL is still relatively low usage and we expect 2.0 to quickly become the dominant format going forward.
I can't really use that trick since this isn't a parser in a programming lanuage where you have that kind of flexibility
also you can have both v1 and v2 errors, so this rule won't help you determine the version
the burden of always having to tag your 2.0
That's fine, put the burden on the v1 users! Or use a file extension. Also, after KDL becomes (world)dominant there might be v3! This future-proofs it.
So you'd have a rule like: the file is assumed to be of the latest version unless specifically tagged or
That's fine, put the burden on the v1 users!
We can't. v1 users are already not tagging their content as v1. And adding something now for users to produce v1 content going forward isn't worth anyone's time; they should be writing v2, as that's what most parsers will be expected to (likely exclusively) accept going forward.
Also, after KDL becomes (world)dominant there might be v3!
And if there are, we can figure out compat at that point.
Note that "make v1 documents tag themselves, v2 can be assumed" is not compatible with a future v3 either; it'll mean that v3 also has required tagging (since a missing tag would indicate v2). So we don't need to worry about it right now; if a v3 ever appears, it'll need to be tagged anyway, so we can introduce the tag then.
the file is assumed to be of the latest version unless specifically tagged
This absolutely does not work, fwiw, unless the language is carefully designed to be back/forward-compatible, and losing data is acceptable. (CSS, for example, has this model.) A data format can't really do this.
We can't. v1 users are already not tagging their content as v1.
And as long as they don't encounter any v2 requiremens, they can continue to do so.
And adding something now for users to produce v1 content going forward isn't worth anyone's time; they should be writing v2
And what about all v1s that have already been written? I don't get it, why should there be no mechanism to differentiate them?
And if there are, we can figure out compat at that point.
then it'd be much more painful due to higher scale
This absolutely does not work
Well, you could add or the parser has heuristically determined the version
. My point is about a SPEC'ed mechanism for an explicit (optional) signal so that
There are also emacs support packages at:
you detect which one parses and go from there, basically :)
This is not possible. There are files valid both in v1 and v2 that produce different data.
Example from my app:
enum "Item" editor="enum" port="hollow" {
generic "some" "Item"
const "none" null
}
If I understand the v2 change correctly, this will change meaning with v2, resulting in the const
item having the second argument be a string instead of null, and my app (node-based configuration generator) will parse it just fine without any errors, but it will then propagate into other parts of my app workflow, and result in the app output having "null"
strings all over the place instead of the expected null. I am lucky since my app is still in development, and I'm the only user, but apps with any measurable user base going to encounter silent troubles like this.
No, v2 makes null
an illegal string, so this will actually be a syntax error, telling you to use #null
instead. It would otherwise be a perfectly valid v2 document.
The same goes for the other keywords: #true
and #false
are both prefixed now as well.
Likewise, there's the very rare "gotcha" where you might have a node name called #null
, that would be a syntax error in v2.
Scanning again through the changelog, the vast majority of changes make v2 more strict or use incompatible-with-v1 syntax, so it would be an error in either one.
The only change I can think of that could "silently" change the data without being a syntax error in v1 or v2 is the new multiline string indentation stripping, where the following string means two different things whether you're in v1 or v2:
node "
indented string
"
in v1, this yields:
node "\n indented string\n "
and in v2:
node "indented string"
There's no way to distinguish which of these is v1 and v2.
I see, thanks for the clarification. As long as no v1 document can change meaning in v2 without syntax error, migrations would be much smoother.
As ckdl is a single-pass streaming parser (and I want to keep it that way), I couldn't do a "if KDLv2 gives a syntax error, retry with the v1 parser" loop. Still, it was pretty straightforward to build a parser that starts out agnostic about the KDL version until the first time it sees a construct that is only valid in one version or the other.
The (experimental/draft) hybrid parser in ckdl processes all valid KDLv2 documents correctly and (currently) only compromises on KDLv1 compatibility in a few cases (that I know of), if they occur before any v1-only construct:
#null
, #inf
, etc. (this could be fixed with some effort)I really wouldn't expect other hybrid parser implementations to have a compatibility story worse than this.
This issue is for tracking full test suite compliance for KDL implementations that support the new KDL 2.0.0 spec.
As of 2024-02-07's 2.0.0-draft.3, the recommendation is now for implementations to start implementing 2.0 and submitting comments for any trouble they might run into while implementing it. Once we have enough satisfied implementors, we will start the process of releasing the final 2.0 spec!
Implementations: