Some characters are not highlighted as part of a token

rzk-lang / vscode-rzk

Visual Studio Code Extension(s) for Rzk proof assistant.

https://marketplace.visualstudio.com/items?itemName=NikolaiKudasovfizruk.rzk-1-experimental-highlighting

Other

8 stars 1 forks source link

Some characters are not highlighted as part of a token #10

Closed fredrik-bakke closed 1 year ago

fredrik-bakke commented 1 year ago

Although the subscript one character ₁ is not highlighted as part of the preceding token, the typechecker considers it part of it. This is because \b matches here when it shouldn't:

A couple of other characters with the same behavior are -+*.

To fix this, I suggest writing custom patterns for matching word boundaries instead of using \b. As a start, here are the ones I use for Agda code:

(?<=^|\\{\\!|\\{-\\#|[\\s.;{}()@\"]) for left boundaries
(?=$|\\!\\}|[\\s.;{}()@\"]) for right boundaries

You probably don't want the hole or pragma parts, and also, maybe you don't want commas and some other characters in tokens. Hence I'll suggest the following:

(?<=^|[\\s.,;{}()<>@\"\\\\_|]) for left boundaries
(?=$|[\\s.,;{}()<>@\"\\\\_|]) for right boundaries

fredrik-bakke commented 1 year ago

While on topic, you seem to have defined capital sigmas as special symbols as well. Can these not be part of a token? If so I can't say I agree with that decision. For instance, we use them all the time in tokens in agda-unimath.

fizruk commented 1 year ago

Sigma can be a part of a token, please, refer to the LBNF file Syntax.cf in the rzk repository for the source of truth. Unfortunately, I can't always translate this properly into other syntax highlighting configs.

fizruk commented 1 year ago

In particular, see this line:

https://github.com/fizruk/rzk/blob/936c15a37f52d77fac436558b5fbc18e31ea4e40/rzk/src/Language/Rzk/Syntax.cf#L6

It means that a token (for a variable) currently can contain anything except the following symbols:

\ ; , # \ " ] [ ) ( } { > < | `\t`\n \r

Additionally, symbols - ? ! . are not allowed to as the first symbol of a token.

fredrik-bakke commented 1 year ago

Thanks for the explanation!