p4t5h3 / purebasic-language-for-sublime-text

PureBasic support for Sublime Text.
MIT License
4 stars 1 forks source link

Comparison Operators: Wrong Scopes and Broken Ligatures #44

Open tajmone opened 2 years ago

tajmone commented 2 years ago

There are multiple problem with the state of how comparison operators are captured (refer to the syntax_test_operators_comparison.pb on my fork), both in term of wrong scopes and breaking ligatures support:


I should be able to fix all the scoping problems described above, and ensure that composite operators are captured as a single token. The RegEx from my old Sublime PureBasic package should work fine, but I can also reuse various RegExs which I wrote for syntax highlighters, and I have all the required test files lying around in my hard disk. Just give me some time and I should come up with a PR that fixes these.

As for the "Assignment vs Comparison" problem with =, I will create a dedicated branch to fix it, since it's probably going to require a multi-step approach, and requires extensive testing to make sure that all the correct comparison contexts are covered, without breaking anything. I'll update you when I have set-up a dev branch for this.

p4t5h3 commented 2 years ago

Composite Operators Split Tokens

I am using JetBrains Mono and I just did not notice that ligatures were not applied. Good catch. That would be a nice improvement. While writing this comment I assume it is a simple matter of ordering pattern the right way in the syntax definition.

Assignment vs Comparison

That is something I thought about some time while I wrote the initial syntax definition. For now I do not see an advantage in making a difference here. Yes, it would be technically correct but what do we gain from that? I am not aware it improves the editing experience. I am hesitant because semantics go beyond what I consider the scope of a syntax definition.

tajmone commented 2 years ago

Yes, it would be technically correct but what do we gain from that?

Plug-ins. They could operate on the syntax based on correct semantics, e.g. for refactoring. If a plug-in is able to distinguish where variables are assigned vs compared it could allow a number of features, like jumping to definitions, etc. (whereas ST doesn't allow control over which Symbols are indexed for Goto Symbol).

But there are various PB language constructs that are going to be hard (if not impossible) to capture with ST's syntaxes, this being just one example — although I believe this one should be fixable to some degree, by looking into expressions contexts.

I do think there's a clear limit on the accuracy level achievable for PureBasic, i.e. without creating an LSP language server. With RegEx-based syntaxes it just becomes to hard to track nested contexts, and some of PB's language constructs don't make the job easier either (e.g. macros).

p4t5h3 commented 2 years ago

whereas ST doesn't allow control over which Symbols are indexed for Goto Symbol)

Do you mean the "Goto Symbol…" feature of Sublime Text? With the current syntax definition this works for declarations of in example structures.

tajmone commented 2 years ago

What I meant is that the scopes which allow "Goto Definition" are pre-defined/hard-coded in ST — functions, classes, TOC entries, and possibly one more scope (these are not even well documented) — and you can't decide which scopes should be assigned the "Goto Definition" functionality.

Unfortunately this forces many syntaxes to use incorrect scopes, just in order to benefit from "Goto Definition", so you end with packages scoping as functions or classes constructs which have nothing to do with them — so much for the "semantics scope guidelines" that the ST docs insist we should adhere to!

This is especially true for syntaxes which are not languages, e.g. markup syntaxes, where you end up scoping as functions and classes anything you'd like to be able to jump to its definition (e.g. substitution macros/constants, etc.).

Take a BNF grammar, for example (which is the mother notation of all syntaxes), you definitely want to be able to jump to the terminal symbols definitions. You'll have to scope them as either functions or classes, which they are not.

The real problem (bad semantics aside) is that there is a limited number of scopes that can be used for this purpose, since they are hard-coded. Since you'll want to **at least** distinguish between different syntax elements by using different scopes, you ultimately end up with limited choices of symbols that will allow "Goto Definition".

You'll notice that in ST forum one of the most recurrent newbie questions is how to enable Goto Definition for specific scopes; as mentioned, the documentation is vague about this, and doesn't provide a list of the scopes for which "Goto Definition" is enabled, instead they are mentioned in scattered places en passant — problematic aspects of the editor tend to be obscured in the docs.

Other editors (e.g. VSCode) allow users to control this via the API.

tajmone commented 2 years ago

Comparison Operators Fixed

Ok, in PR #47 I've fixed the problems with bi-char comparison operators being scoped as individual tokens, and now ligatures are shown properly.

I've also amended the scope of logical word operators from keyword.logical to keyword.operator.word, as suggested in the guidelines.

Some operators definitions didn't include the {{following_pointer}} variable in their definition, so following pointers were being captured as a multiply arithmetic operator; I've fixed that and added coverage tests for pointers following operators.

Assignment vs Comparison

Now that each operator group has its own named context, it should be easier to implement the distinction between assignment and comparison =, i.e. by excluding operators_assignment from inclusion inside evaluation expressions.

It should also render scope tracing easier, since the ST4 engine offers the new Context Backtrace feature when manually inspecting scopes — although right now I'm not able to see the named operator contexts with it.

I think the problem might be due to the presence of anonymous contexts being pushed/set in the path to operators, which (if I've understood correctly) might disable Context Backtrace functionality. The documentation doesn't really say much about this new feature:

While editing in Sublime Text, you can check what scopes have been applied to the text under the caret by pressing control+shift+p (Mac) or ctrl+alt+shift+p (Windows/Linux).

But from my own tests, I did notice that the feature seems to only work when no anonymous context are in the way. Anyhow, it would be nice to be able to have the syntax backtrace all the context that lead to a specific scoping, since it will simplify tracking contexts paths as the syntax grows in complexity.

Anyhow, I noticed that many of the current anonymous contexts push the same list of include: contexts, so they could be replaced by a named context that does that, which would reduce redundancy and render the syntax more readable. I'll experiment with it locally and let you know.

Renaming Operators Test Files

As mentioned in #43, I'd like to rename all the operators tests so that they begin with syntax_test_operators_, which will group them together in the directory listing order, making it much easier to select them all at once when needing to edit their tests, instead of having to sift through the entire files list to find them.

p4t5h3 commented 2 years ago

Some operators definitions didn't include the {{following_pointer}} variable in their definition, so following pointers were being captured as a multiply arithmetic operator; I've fixed that and added coverage tests for pointers following operators.

Good catch.