sublimehq / Packages

Syntax highlighting files shipped with Sublime Text and Sublime Merge
https://sublimetext.com
Other
2.95k stars 587 forks source link

[RFC] Is it useful to mark “the full path” of a function call as meta.function-call? #2575

Open Thom1729 opened 3 years ago

Thom1729 commented 3 years ago

From the Scope Naming Guidelines:

Function names, including the full path, and all parameters should receive the following scope. The name of the function or method should be variable.function, unless the function is scoped with support.function.

  • meta.function-call

According to the guidelines, the entirety of the following line of Python should be marked meta.function-call:

Foo.bar.baz(x, y, z=42);

There are three things I don't like about this.

1. It's not useful.

In the above example, a high-quality syntax will apply plenty of useful scopes:

All of these scopes may be useful for syntax highlighting and even for basic code intelligence features. But it's not clear that scoping the entire line meta.function-call adds anything of value. It doesn't make sense to highlight the entire line as a whole, but it's not necessary for highlighting the individual pieces (because more detailed scopes exist). And it's probably useless for code intelligence because of (2) and (3).

2. It's ambiguous.

The issue here is that “the full path” is supposed to be marked meta.function-call. It's clear how this applies to simple cases like the Python example above. It's not clear how it applies in more complex cases. In the following examples, what constitutes “the full path”?

These questions surely have answers — perhaps many answers. It's hard to come up with an obvious rule that would apply sensibly across a variety of languages.

3. It's really hard to implement.

Detecting whether a mere identifier is the name of a function being called can be tricky. In most languages, there may be a newline between the identifier and the open paren. In many languages, there may be type parameters between them as well. Nearly all syntaxes I've seen cope with this on a best-effort basis using a lookahead before the identifier. In TypeScript, it has to be exact or highlighting can break, and so that definition uses a fairly complex implementation using branching that's still not entirely satisfactory.

Scoping “the full path” is even harder than this. The JavaScript syntax uses a lookahead that probably works as often as not (as long as there are no newlines), but in TypeScript that lookahead definitely doesn't handle function type parameters. Even this fragile implementation costs over fifty lines of code.

As far as I know, none of the core syntaxes can reliably scope “the full path”. This makes the scope useless for code intelligence features; if you actually need that information, then you'll probably have to get it from a proper language server.


The reason this is on my mind is that I've been trying to reduce the use of complex lookaheads in the JavaScript syntax. Scoping call paths, even badly, requires a fair bit of code. I considered replacing the fragile lookahead with a branching implementation, but I fear that it would be both complicated and slow. The obvious alternative is to forget the call path and just scope the function name and arguments — but the scope naming guidelines are explicit that the entire call path should be scoped, and I wouldn't want to go against the guidelines without a consensus.

What does the community think? Is it genuinely useful to scope an entire call path as meta.function-call? If so, is there a sensible interpretation of what constitutes the call path across different languages?

jwortmann commented 3 years ago

But it's not clear that scoping the entire line meta.function-call adds anything of value.

Here is an example in Python where I use meta.function-call in my color scheme to distinguish between arguments in function definitions and keyword-arguments in function calls, because both use variable.parameter and I'd like to highlight them differently:

def foo(bar):
#       ^^^ meta.function.parameters.python variable.parameter.python

call(baz=42)
#    ^^^ meta.function-call.arguments.python variable.parameter.python
{
    "scope": "variable.parameter - meta.function-call - meta.annotation",
    "foreground": "var(orange)"
},

Theoretically I could use the recommended scope meta.function.parameters variable.parameter from the guidelines to only target function definitions, but several of the default syntaxes (Java, JavaScript, Go, Lua, C#, D, ...) and probably even more 3rd-party packages don't apply the meta.function.parameters scope, so in practice this doesn't work.

michaelblyons commented 3 years ago

@jwortmann I could be mistaken, but I don't think that's the issue here. call(baz=42) would have the same scopes it always has. I take this from

The entire argument list will probably be scoped in some way.

(Thom, please correct me if I'm wrong.)

# It's this (proposed):
foo.call(baz=42)
  # ^^^^^^^^^^^^ meta.function-call

# versus this (current):
foo.call(baz=42)
# ^^^^^^^^^^^^^^ meta.function-call

There's definitely still room for weirdness, though. This is legal in several languages:

foo[bar](baz=42)

though it's not too different from the bar()() Thom already highlighted.

jwortmann commented 3 years ago

Oops, you're right, I wrote my comment after reading the "It's not useful" section and only skimmed over the rest of the post, and wanted to give a counter example because I misinterpreted it as a claim to drop meta.function-call in general. 🙈

Sorry for the confusion then and please take my previous comment as irrelevant!

michaelblyons commented 3 years ago

I suspect that the trouble with Thom's suggestion, if any, is that some people want to have meta scopes for full path of identifiers (foo.bar[baz].biff) and another set of people (with some overlap) want function calls (call(...)) to be meta-scoped as well. The tricky bit, then, is agreeing on the boundaries, especially if both parties want to claim call.

keith-hall commented 3 years ago

From what I remember, meta.function-call is used in the function references lookup, so I wonder how that would be affected by not scoping the full path. In general, I agree with not scoping the full path though, it indeed seems more trouble than its worth.

Theoretically I could use the recommended scope meta.function.parameters variable.parameter from the guidelines to only target function definitions, but several of the default syntaxes (Java, JavaScript, Go, Lua, C#, D, ...) and probably even more 3rd-party packages don't apply the meta.function.parameters scope

We could/should get that fixed for the syntaxes in this repo. I'll volunteer to look at C#.

deathaxe commented 3 years ago

Indexer uses meta.function-call variable.function. Whether the fully qualified identifier is included into meta.function-call doesn't effect this assuming the path elements are scoped differently like variable.namespace, etc.

I've experimented with adding meta.path to nearly all fully qualified identifiers in my Java Rewrite. Including function calls or fully qualified variable access in function blocks would be one of the next steps probably.

So I can feel with @Thom1729's concerns and follow his arguments. Any identifier used in a path may consist of an arbritary complex expression, which needs to be parsed using several contexts, The following example shows a fully qualified function call from a generic function, which even may contain annotations which can look like fully qualified function calls as well.

class Foo<T> {

    public T func(int argc, String argv[]) {

        // Compliance with Scope Namging Guidelines

        Foo<T<? extends Bar @anno.anno(int p)[]> @anno.anno(int p)[]> @anno.anno(int p)[].bar(argc, argv)
//      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.function-call.identifier meta.path
//                                                                                           ^^^^^^^^^^^^ meta.function-call.arguments

        // The current solution

        Foo<T<? extends Bar @anno.anno(int p)[]> @anno.anno(int p)[]> @anno.anno(int p)[].bar(argc, argv)
//                                                                                        ^^^ meta.function-call.identifier
//                                                                                           ^^^^^^^^^^^^ meta.function-call.arguments
    }
}

I managed to handle those situations by heavy usage of nested branching. The first branch-level decides whether a qualified or unqualified identifer is matched in order to add meta.path to qualified ones only. The second and third level are needed to distinguish package names and class identifiers as well as the function identifier, finally.

I guess I can make it work in Java. First experiments look promising with regards to performance impacts being negligible compared with what I have at the moment, but it adds quite a bunch of code and complexity again.

The downsides are:

  1. The syntax definition becomes quite complex and large.
  2. The heavy usage of branching causes parse times of those identifiers to increase by 5 (e.g.: 10k lines of @fully.qualified.ParentClass.InnerAnnotation 34ms -> 156ms),
  3. Depending on complexity of those arguments, we may need several branches to detect whether a fully qualified identifier is a function-call, member variable access or whatever - each try causing the lexer to rewind and throw away lots of work, just to parse it again in order to add another meta scope.

    @wbond did a good job in optimizing the engine to avoid/reduce performance impacts when working within such a heavy complex statement and the final result of Java is still faster then the original one (13ms vs. 30ms for the original test file), but that's probably caused by other heavy optimizations or the strategy change in parsing at all.

But yeah. I started with Java in Nov.2019 not being fully satisfied in Nov.2020 - with some smaller breaks to tweak Bash and CSS though.

| It's not clear how it applies in more complex cases

Any expression followed by an accessor (., ->, ::, ...) is a namespace element. For this statement to be true a syntax must consequently push into contexts for any kind of expression in exactly the way the compiler would handle it. That might easily become ambigous, complex and sometimes hardly possible without sematic information about tokens. While it works for Java I have a feeling it would fail in C/C++ or other syntaxes.

Lookaheads in general are no solution to handle such complex stuff. It needs branching to try to find the correct kind of expression. That's needed to handle the arbritary amount and structure of elements each identifier expression can consist of.

| It doesn't make sense to highlight the entire line as a whole,

Function calls can be part of a more complex statement which makes meta.function-call not to cover the whole line. So I guess this argument is of low value.

| It's really hard to implement.

Yes, yes and ... yes.


Omitting meta.function-call from the path part also means to not scope fully qualified function identifiers meta.path - another guideline?

meta.path could propably help finding boundaries of fully qualified identifiers when used in nested situations (see: annotations), but yes - who really needs it.

In general I tend to think meta.path and covering the whole identifier as meta.function-call adds a complexity which is not worth it and agree it may cause more trouble than use.

Thom1729 commented 3 years ago

To confirm: yes, in Foo.bar() I think it still makes sense to highlight bar() as meta.function-call. This is a lot less problematic than highlighting the entire path. In a case like Foo[bar](), only the parens would get meta.function-call. This should be quite reasonable to implement in most languages, even exactly, and I think it should provide most of the utility that the scope might offer.

Theoretically I could use the recommended scope meta.function.parameters variable.parameter from the guidelines to only target function definitions, but several of the default syntaxes (Java, JavaScript, Go, Lua, C#, D, ...) and probably even more 3rd-party packages don't apply the meta.function.parameters scope, so in practice this doesn't work.

In the case of JavaScript, this is because it uses a nonstandard meta.function.declaration scope for the benefit of the symbol list. Since the new symbol list functionality basically doesn't use that information anymore, this should be an easy fix. It's possible that the scope is missing from other languages for similar reasons.

From what I remember, meta.function-call is used in the function references lookup, so I wonder how that would be affected by not scoping the full path.

I think that as long as the function identifier and arguments get meta.function-call, then this should be no problem. I'm no expert on the references stuff, though.

@deathaxe Thanks for the example and performance info. I think this supports the intuition that exact parsing of paths is likely to have significant performance costs.

deathaxe commented 3 years ago

I guess detecting Foo[bar] as possible function identifier is one of the more simple parts here as you might need something like that for generics as well. It's just branching with the first branch handling Foo as variable.function followed by the item-access context and succeeding on ( after the last closing ]. The else branch would be the normal variable.other then. Index expressions are not too expensive normally.

I guess the most tricky, complicated and hence bloating part is trying to detect all the different kinds of elements in a fully qualified identifier. In Java you need to consume all lowercase path elements as package, switch to consuming classes as soon as the first uppercase identifier is hit and finally find the leave, which maybe a constant, variable or function, while keeping in mind each path element may be an array or generic.

As detecting the leave part happens on context the described algorithm needs to be copied for nearly each data type, which means heavy duplicating.