tajmone commented 2 years ago

There are multiple problem with the state of how comparison operators are captured (refer to the syntax_test_operators_comparison.pb on my fork), both in term of wrong scopes and breaking ligatures support:

[x] Wrong Scope — they are captured as keyword.comparison instead of keyword.operator.comparison. Instead of the obsolete (and incomplete, opinionated) documentation on Scope Naming, see how the native syntaxes actually capture these operators in other languages, as well as good packages do. Other useful references:
- ScopeNamingGuidelines repository by Sublime Text.
- Sublime Syntax Dashboard — covers scopes for non-mainstream syntaxes too.
- https://github.com/tushortz/scopes — 161 Programming language scopes.
- metalanguage ScopeList
[x] Composite Operators Split Tokens — most of the bi-character operators are not being captured as a single token, but as two separate tokens, which prevents showing correct fonts ligatures.

This needs to be manually verified by visual inspection, since the syntax tester can't distinguish between two separate adjacent tokens with a same scope and a single-captured token. To check this, past the following code in a PB file in ST:
```
If x >= y ; Expected ligature: >=

If x => y ; Expected ligature: =>

If x <= y ; Expected ligature: <=

If x <> y ; Expected ligature: <>
```
You should notice the difference between the actual operator and the way it's represented in the comment next to, since the comment should use the font ligatures instead, because it's not splitting the tokens.

If you're not seeing the ligatures properly, ensure that are using a good code font that supports ligatures (try FiraCode, one of the best for coder), and that you have correctly enabled ligatures in ST settings (globally or for the PB syntax specifically). Ligatures setting might depend on the OS being used, by usually the default settings should work fine, supporting ligatures out of the box (it wasn't so in ST3 though). Depending on the specific language, and which ligatures it supports, you might have to tweak the ligature settings to restrict the applied groups (clig, liga and calt).

But right now, the problem is that ligatures are showing correctly in the comments but not in the operators themselves, due to split-capturing which doesn't preserve them as a single token.
[ ] Assignment = in Composite Operators — The = in composite operators is always captured as keyword.operator.assignment. When the capturing RegEx is fixed, and composite operators are captured in the correct way and with the right matching precedence, this should be automatically solved.
[ ] Assignment vs Comparison — The = operator is currently being always captured as keyword.operator.assignment, even in contexts where it's actually used for comparison.

This one is going to be trickier to solve, since it depends on context and not the capturing RegEx. We'll need to be able to tweak the syntax to become context-aware so that it can capture it as a keyword.operator.comparison in the right contexts.

But this is going to require strong tests coverage, to ensure we don't break the syntax. Also, since ST4 introduces better debugging info when inspecting scopes, we should substitute anonymous contexts with named contexts, so we can better trace the path leading to the various scoping result.

Unfortunately PureBasic doesn't use == as a comparison operator, like most modern languages do, instead it uses = for both assignment and comparison — a bad design choice, and one which we'll have to cope with, since it doesn't look like the new PB6 is fixing this.

I should be able to fix all the scoping problems described above, and ensure that composite operators are captured as a single token. The RegEx from my old Sublime PureBasic package should work fine, but I can also reuse various RegExs which I wrote for syntax highlighters, and I have all the required test files lying around in my hard disk. Just give me some time and I should come up with a PR that fixes these.

As for the "Assignment vs Comparison" problem with =, I will create a dedicated branch to fix it, since it's probably going to require a multi-step approach, and requires extensive testing to make sure that all the correct comparison contexts are covered, without breaking anything. I'll update you when I have set-up a dev branch for this.

p4t5h3 commented 2 years ago

Composite Operators Split Tokens

I am using JetBrains Mono and I just did not notice that ligatures were not applied. Good catch. That would be a nice improvement. While writing this comment I assume it is a simple matter of ordering pattern the right way in the syntax definition.

Assignment vs Comparison

That is something I thought about some time while I wrote the initial syntax definition. For now I do not see an advantage in making a difference here. Yes, it would be technically correct but what do we gain from that? I am not aware it improves the editing experience. I am hesitant because semantics go beyond what I consider the scope of a syntax definition.

tajmone commented 2 years ago

Yes, it would be technically correct but what do we gain from that?

Plug-ins. They could operate on the syntax based on correct semantics, e.g. for refactoring. If a plug-in is able to distinguish where variables are assigned vs compared it could allow a number of features, like jumping to definitions, etc. (whereas ST doesn't allow control over which Symbols are indexed for Goto Symbol).

But there are various PB language constructs that are going to be hard (if not impossible) to capture with ST's syntaxes, this being just one example — although I believe this one should be fixable to some degree, by looking into expressions contexts.

I do think there's a clear limit on the accuracy level achievable for PureBasic, i.e. without creating an LSP language server. With RegEx-based syntaxes it just becomes to hard to track nested contexts, and some of PB's language constructs don't make the job easier either (e.g. macros).

p4t5h3 commented 2 years ago

whereas ST doesn't allow control over which Symbols are indexed for Goto Symbol)

Do you mean the "Goto Symbol…" feature of Sublime Text? With the current syntax definition this works for declarations of in example structures.

tajmone commented 2 years ago

What I meant is that the scopes which allow "Goto Definition" are pre-defined/hard-coded in ST — functions, classes, TOC entries, and possibly one more scope (these are not even well documented) — and you can't decide which scopes should be assigned the "Goto Definition" functionality.

Unfortunately this forces many syntaxes to use incorrect scopes, just in order to benefit from "Goto Definition", so you end with packages scoping as functions or classes constructs which have nothing to do with them — so much for the "semantics scope guidelines" that the ST docs insist we should adhere to!

This is especially true for syntaxes which are not languages, e.g. markup syntaxes, where you end up scoping as functions and classes anything you'd like to be able to jump to its definition (e.g. substitution macros/constants, etc.).

Take a BNF grammar, for example (which is the mother notation of all syntaxes), you definitely want to be able to jump to the terminal symbols definitions. You'll have to scope them as either functions or classes, which they are not.

The real problem (bad semantics aside) is that there is a limited number of scopes that can be used for this purpose, since they are hard-coded. Since you'll want to **at least** distinguish between different syntax elements by using different scopes, you ultimately end up with limited choices of symbols that will allow "Goto Definition".

You'll notice that in ST forum one of the most recurrent newbie questions is how to enable Goto Definition for specific scopes; as mentioned, the documentation is vague about this, and doesn't provide a list of the scopes for which "Goto Definition" is enabled, instead they are mentioned in scattered places en passant — problematic aspects of the editor tend to be obscured in the docs.

Other editors (e.g. VSCode) allow users to control this via the API.

tajmone commented 2 years ago

Comparison Operators Fixed

Ok, in PR #47 I've fixed the problems with bi-char comparison operators being scoped as individual tokens, and now ligatures are shown properly.

I've also amended the scope of logical word operators from keyword.logical to keyword.operator.word, as suggested in the guidelines.

Some operators definitions didn't include the {{following_pointer}} variable in their definition, so following pointers were being captured as a multiply arithmetic operator; I've fixed that and added coverage tests for pointers following operators.

Assignment vs Comparison

Now that each operator group has its own named context, it should be easier to implement the distinction between assignment and comparison =, i.e. by excluding operators_assignment from inclusion inside evaluation expressions.

It should also render scope tracing easier, since the ST4 engine offers the new Context Backtrace feature when manually inspecting scopes — although right now I'm not able to see the named operator contexts with it.

I think the problem might be due to the presence of anonymous contexts being pushed/set in the path to operators, which (if I've understood correctly) might disable Context Backtrace functionality. The documentation doesn't really say much about this new feature:

While editing in Sublime Text, you can check what scopes have been applied to the text under the caret by pressing control+shift+p (Mac) or ctrl+alt+shift+p (Windows/Linux).

But from my own tests, I did notice that the feature seems to only work when no anonymous context are in the way. Anyhow, it would be nice to be able to have the syntax backtrace all the context that lead to a specific scoping, since it will simplify tracking contexts paths as the syntax grows in complexity.

Anyhow, I noticed that many of the current anonymous contexts push the same list of include: contexts, so they could be replaced by a named context that does that, which would reduce redundancy and render the syntax more readable. I'll experiment with it locally and let you know.

Renaming Operators Test Files

As mentioned in #43, I'd like to rename all the operators tests so that they begin with syntax_test_operators_, which will group them together in the directory listing order, making it much easier to select them all at once when needing to edit their tests, instead of having to sift through the entire files list to find them.

[ ] If it's OK with you, after PR #47 is merged I'd like to proceed with their renaming.

p4t5h3 commented 2 years ago

Some operators definitions didn't include the {{following_pointer}} variable in their definition, so following pointers were being captured as a multiply arithmetic operator; I've fixed that and added coverage tests for pointers following operators.

Good catch.

p4t5h3 / purebasic-language-for-sublime-text

Comparison Operators: Wrong Scopes and Broken Ligatures #44

Comparison Operators Fixed

Assignment vs Comparison

Renaming Operators Test Files