What are the transforms applied to data in the YAML parser?

sirosen commented 2 years ago

I'm reading the code which was introduced in #93 and seems to have been refined several times (e.g. #114) since then. My goal is to reimplement this same transform for a tool which I maintain (the check-jsonschema hook for pre-commit) so that it can process pipelines with expressions in them at least as well as the language-service here does.

However, as I dig into the expression handling, I'm not clear about what some of the transformation is actually doing. e.g. This case for an expression with a null value is hard to understand.

Is there any kind of internal documentation -- or just plain old advice -- which the maintainers can offer me on this? Any help understanding the rewrites done by the parser would be very useful.

In case it's of interest, my python implementation can be seen here.

winstliu commented 2 years ago

Looks like I should have added a comment there, because it's not clear to me either...however, my guess is to avoid throwing on an incomplete expression that is still being written out, like

- ${{ if eq(whatever, true) }}:
  # nothing here yet

# or

- ${{ if eq(whatever, true) }}

Any help understanding the rewrites done by the parser would be very useful.

I wish I could help you a bit more, but I remember the crux of the rewriting logic to be "well, the expressions are just an meta-layer over a valid schema, so if I can determine what type of YAML object the expression is, then I should be able to silently recurse past it and attach its children to its parent". You'll notice that the node (parent) parameter stays constant throughout all the recursing as we walk through and discard all the expressions before getting to the "real stuff".

sirosen commented 2 years ago

my guess is to avoid throwing on an incomplete expression that is still being written out, like ...

Even though we're not certain, this helps me a great deal. Since my use case is analysis during git pre-commit, I don't have some of the same concerns as a language server. I'll try to be alert to any handling which is specific to partially written expressions or blocks.

I wish I could help you a bit more ... You'll notice that the node (parent) parameter stays constant throughout all the recursing as we walk through and discard all the expressions before getting to the "real stuff".

👍 No worries at all, this was useful to me! And thank you for taking the time to explain as much as is possible. I understood that part of the parser, but was concerned that there might be other significant "gotchas" involved.

The only thing I still have a question about is a part of addPropertiesToObjectNode:

There are nested ifs here: https://github.com/microsoft/azure-pipelines-language-server/blob/71b20f92874c02dfe82ad2cc2dcc7fa64996be91/language-service/src/parser/yamlParser.ts#L191 If the first clause is satisfied (found an expression in an object), but the content of that expression is not an object, the content appears to be dropped. Am I reading that correctly?

So something like...

foo:
  ${{ if eq(true, true) }}:
    bar: baz
  fizz: buzz

---
# becomes
foo:
  bar: baz
  fizz: buzz

but

foo:
  ${{ if eq(true, true) }}:
    - bar: baz
  fizz: buzz

---
# becomes
foo:
  fizz: buzz

Is that right?

I'm wondering if it would be better (at least, in my case) to reject such a thing as invalid. Do we know of valid cases which look like this, or is this a soft failure mode in which the language-server is choosing not to report anything rather than risking a false negative?

sirosen commented 2 years ago

As of v0.11.0, check-jsonschema now implements a version of the transformation in this repo, but with some stricter handling of unexpected or unclear inputs. Link: https://github.com/sirosen/check-jsonschema/blob/0.11.0/src/check_jsonschema/transforms/azure_pipelines.py

I'm going to self-close, although there are still things which bother me (like the YAML-1.0-esque handling of off and on). I'm happy to discuss this further if it's interesting to compare against an implementation in another language.

winstliu commented 2 years ago

Darn, so sorry I missed your previous message! I must have read it and then forgot to mark it as unread to follow up.

Is that right?

Yeah, I think that's right, since that's invalid YAML. We would (should) report that on the UI side in azure-pipelines-vscode when it fails to validate against the schema, so in your use case it does sound appropriate to reject it.

Not sure about off/on handling -- I'd expect if the schema accepts it, then we have to account for it as well.

microsoft / azure-pipelines-language-server

What are the transforms applied to data in the YAML parser? #119