microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
162.13k stars 28.52k forks source link

Consider adopting a new yaml grammar: Built in YAML grammar left-strips the first character of scalars that are not surrounded by quotes #180523

Closed EarthmanT closed 1 week ago

EarthmanT commented 1 year ago

Does this issue occur when all extensions are disabled?: Yes

Steps to Reproduce:

  1. Disable all extensions.
  2. Create a file with the extension *.yaml.
  3. Start typing some YAML. For example:
foo:
  bar: baz
qux:
  - quxx
  - quxxx
  1. Enable Scope Inspector
  2. Select any of the words in the file.

Expected behavior: The entire word should show up in the scope inspector window.

Actual behavior: However, the word is missing the first character. For example "baz", will show up as "az". Selecting to the left of "baz', we see only "b".

Changing the file type fixes the behavior. Use any other file type, "txt" or "json", etc. Even including inline JSON in the YAML file, and surrounding the words with quotes fixes the behavior. This appears to be an issue with the json->yaml tree in the grammar. bug bug2 bug3 bug4

VSCodeTriageBot commented 1 year ago

Thanks for creating this issue! It looks like you may be using an old version of VS Code, the latest stable release is 1.77.3. Please try upgrading to the latest version and checking whether this issue remains.

Happy Coding!

EarthmanT commented 1 year ago

@VSCodeTriageBot Issue also happens in 1.77.3.

alexr00 commented 1 year ago

Interesting. @EarthmanT is this causing a problem for you?

EarthmanT commented 1 year ago

@alexr00 I suppose there's a chance that it's not related to any issues that I'm experiencing. I suspect it is causing word completion in my language server to match based starting on the 2nd character of most tokens.

alexr00 commented 1 year ago

We currently get our yaml grammar from https://github.com/textmate/yaml.tmbundle, but it looks to be unmaintained. Repurposing this issue to track adopting a new yaml grammar.

alexr00 commented 8 months ago

Tried out the grammar at https://github.com/redhat-developer/vscode-yaml/tree/main/syntaxes, but it has the same problem.

Cthuflu commented 5 months ago

Following the advice written in the yaml.tmLanguage.json grammar, I had submitted a fix for this in the textmate/yaml.tmbundle repo: https://github.com/textmate/yaml.tmbundle/pull/45

RedCMD commented 3 months ago

Hello @alexr00 I was thinking of maintaining the Yaml grammar myself (forking it) fix all the bugs, add missing features etc

what would be some things I need to look out for/need to do? licenses etc

I also see that Tree-sitter is coming along how long would my fork get used before effectively becoming obsolete? I did notice that the Yaml Tree-sitter grammars seem unmaintaned https://github.com/microsoft/vscode/issues/207416#issuecomment-1991675411 would you still use my TextMate grammar when Tree-sitter arrives?

https://yaml.org/spec/1.2.2/ seems comprehensive

alexr00 commented 3 months ago

Hi @RedCMD, we have no timeline for retiring TextMate support, so I expect a new TextMate grammar for yaml would be used for as long as you maintain it and it is better/better maintained than the tree-sitter grammar. This is just a guess, but 3+ years seems like a safe guess.

Each time we switch a built in language to a tree-sitter grammar there will likely be noticeable syntax highlighting differences, so we will likely stick with an existing, well-maintained, TextMate grammar over a tree-sitter grammar for languages where such a TextMate grammar exists.

As you noticed in https://github.com/microsoft/vscode/issues/207416#issuecomment-1991675411, I have concerns about how well the tree-sitter grammar is maintained, so I would definitely be interested in trying out a yaml TextMate grammar that you fork. For adopting tree-sitter grammars, I will start first with the ones that we'll get the biggest gain from, which from https://github.com/microsoft/vscode/issues/207416 is HTML, XML, and Groovy, before moving on to those that are less certain (yaml and ini).

For the license, if you use MIT then VS Code will be good to use it. As for things to look out for, we do have integration tests for grammars in this repo, but you do need to follow all the steps to build the project then run yarn test-extension -l vscode-colorize-tests so if you find a way to make tests in your repo it would likely be simpler, though I do think there's at least one grammar author that runs our tests.

RedCMD commented 3 months ago

cool cool I guess I'll just message when I think I'm ready 😊 https://github.com/RedCMD/YAML-Syntax-Highlighter/issues/1

alexr00 commented 3 months ago

Awesome. Yes, please just message when you think it's ready!

RedCMD commented 2 months ago

I may have gone a little overboard instead of forking tmbundle, I kinda just rewrote the entire grammar the structure of the old grammar couldn't support all the weird rules that YAML has requiring a redesign

there's still a few little bugs tho they are exceptions of exceptions and were also bugs in the old grammar anyway

I haven't set up a automated test yet but I have 85kb in test files tho I'll prob have to sort them into valid and invalid YAML code

do you know some extensions that have automated testing?

I've just ran the yarn test-extension -l vscode-colorize-tests command with my extension enabled it fails on the test.yaml file

1 failing
AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:

image because it contains multiple lines of invalid YAML code

(the other YAML files do pass)

alexr00 commented 2 months ago

@RedCMD is this the right grammar to use? https://github.com/RedCMD/YAML-Syntax-Highlighter/blob/main/syntaxes/yaml.tmLanguage.json

RedCMD commented 2 months ago

@alexr00 that's the main file yes

I'm also just waiting on the outcome of the indentation onEnterRules saga https://github.com/microsoft/vscode/issues/209519

Pre-existing grammar issues has a list of the existing bugs I've fixed/made sure not to reopen

alexr00 commented 2 months ago

I took a look, and I agree that based on the VS Code tests your grammar looks good! There's one place where I see an obvious difference. In your grammar you have the scope string.unquoted.other.out.yaml instead of string.unquoted.plain.out.yaml. Is there a reason for this difference? I can compensate for the change in the built in VS Code themes, but any extension themes that rely on the old value might be broken.

alexr00 commented 2 months ago

RE the onEnterRules saga https://github.com/microsoft/vscode/issues/209519, I don't know that I would wait for that before adopting the new grammar in insiders, since we don't currently have any onEnterRules for YAML built in.

RedCMD commented 2 months ago

string.unquoted.other.out.yaml vs string.unquoted.plain.out.yaml

that's my mistake I must of mashed together the TextMate scope names string.quoted.other and string.unquoted I can change it back to string.unquoted.plain.out.yaml it seems more fitting to

onEnterRules can be added in anytime. no big deal

alexr00 commented 2 months ago

For automated testing, it looks like there are some automated tests for shellscript here: https://github.com/jeff-hykin/better-shell-syntax/blob/6d0bc37a6b8023a5fddf75bd2b4eb1e1f962e4c2/commands/project/test#L3-L7

RedCMD commented 2 months ago

@alexr00 should I move the 1.2.tmLanguage.json grammar into the yaml.tmLanguage.json file?

it would probably help keep compatibility with old bots copying the single file directly from VSCode (when they should be copying all 5 files now) or does it not really matter?

RedCMD commented 2 months ago

just fixed the string.unquoted.plain.out.yaml issue https://github.com/RedCMD/YAML-Syntax-Highlighter/commit/287c71aeb0773759497822b5e5ce4bdc4d5ef2aa

alexr00 commented 2 months ago

@alexr00 should I move the 1.2.tmLanguage.json grammar into the yaml.tmLanguage.json file?

it would probably help keep compatibility with old bots copying the single file directly from VSCode (when they should be copying all 5 files now) or does it not really matter?

Doesn't matter, I've already updated the script here: https://github.com/microsoft/vscode/pull/219833

alexr00 commented 2 months ago

My usual approach is to keep new grammars in VS Code insiders for a couple months before releasing them to stable. This means I'll just revert back to the old grammar before release for the July release. We'll see how many issues we get before the August release and assess if it's time to fully commit to the new grammar in time for the August release.

alexr00 commented 2 months ago

Grammar switchover is in! We plan to release stable today, which means the new grammar should go out in insiders tomorrow.

RedCMD commented 1 month ago

Currently folding block scalars show up as red

image

>1
 folding block-scalar
 shows up as red

because VSCode hasn't copied the "unbalancedBracketScopes" section https://github.com/microsoft/vscode/blob/d4c164e7b85364cfdc24b4db23858abe2ff7194b/extensions/yaml/package.json#L75-L79

"unbalancedBracketScopes": [
    "invalid.illegal",
    "storage.type.tag.shorthand.yaml",
    "keyword.control.flow"
]

and for some reason the indentation is whack? tabs vs spaces https://github.com/microsoft/vscode/blob/d4c164e7b85364cfdc24b4db23858abe2ff7194b/extensions/yaml/package.json#L53-L80

alexr00 commented 1 week ago

Let's release this in the upcoming release.

alexr00 commented 1 week ago

Marking as verified because the colorization tests pass.