sublimehq / Packages

Syntax highlighting files shipped with Sublime Text and Sublime Merge
https://sublimetext.com
Other
2.95k stars 587 forks source link

[Scala syntax] Type Lambdas - special case handling for "αβ" #682

Closed jpz closed 7 years ago

jpz commented 7 years ago

Hi,

I see that characters "αβ" seem to be treated in a special manner over and above other symbols - I'm not clear why this is. Ideally they should just be regarded as regular symbols, no different to any other?

In addition, these type lambda functions seem to be deliberately highlighted as "comments." Is there good reason for this?

image

@djspiewak @wbond

wbond commented 7 years ago

@djspiewak Will have to answer this, as he is the resident Scala expert who has written the bulk of the current syntax.

djspiewak commented 7 years ago

@jpz Alpha and beta are treated specially only within types. The intent was (and is) to treat them specially only within type lambdas, but fixing that was a relatively low priority compared to the other stuff in the mode that needed to be sorted out. It would certainly be a worthwhile change now.

As for the special handling to begin with…  Sublime doesn't have a notion of parameterized contexts (nor do I think it should), which means that we can't do the semantically correct thing here, which is to pull out the symbols within the square brackets of the type lambda and only comment-ify those symbols. So instead, we just assume that alpha and beta are exclusively the domain of type lambdas, and we highlight them accordingly. This is accurate on most Scala code, though there are some notable exceptions (I would imagine Matryoshka has some slightly-off highlighting that would be fixed if alpha/beta were only scoped within type lambdas).

More generally, type lambdas themselves are highlighted as comments simply because they are pure boilerplate that serves absolutely no function other than to coax the compiler along. The construct itself is elided, both by the compiler and visually by the programmer, and really just serves as noise. The comment-scoped highlighting reflects the way that the construct is treated in practice when reading or writing code: syntactic noise around the type constructor within. I've thought several times about using different scopes here, but none of them really make more sense than comment.

Now that types are more sanely handled in general by the mode, we could remove the special scoping on type lambdas (and by extension, alpha/beta) if we wanted. The result would be a lot noisier in code that uses type lambdas, but it would be a better situation than it was when I first added the scoping. I'm personally in favor of keeping it.

jpz commented 7 years ago

Hi, is it the case that Scala the language has special treatment for alpha and beta characters within type lambdas, or is this a decision you have made on the basis of some of the arbitrary conventions within some libraries?

I'm not aware of any special treatment of these two characters by the scala compiler.

Thanks

djspiewak commented 7 years ago

There is no special treatment of those two characters. It's an arbitrary convention. Type lambdas as a whole are also an arbitrary convention and there is no special treatment of them in the compiler. They're also fairly universal. I only know of one programmer who both uses type lambdas extensively and doesn't (mostly) use lambda, alpha and beta, and that would be Paul Chiusano. Everyone else basically sticks with the convention.

jpz commented 7 years ago

This seems a bit opinionated to me to put in the syntax definition, given Intellij or Eclipse do not, and you are colourising arbitrary usage based on semantic relevance, rather than the grammar of the language - e.g. underemphasising/greying out some code to the same colour as "comment" - people that look at that offhand will not comprehend why it is greyed out.

It is the norm in many IDEs, for code which is redundant (e.g. redundant imports, unreachable code or unused variables) to be greyed out, and hint that it will not change program correctness to delete it - but this code in particular is not specifically redundant.

In this manner it overloads the meaning of "grey" to being something else again, that it previously was not. From a UX perspective, that can create confusion, as it's not discernable for an average user why some of the code is now coming up grey.

FichteFoll commented 7 years ago

I have never used Scala so I can't wage in with any sort of experiece, but type annotations in several other languages are not highlighted as comments (and for good reason, imo). Haskell has a pretty good type inference system but annotations are highlighted normally. Python doesn't do anything with annotations and gives the author complete freedom with them, although they are being recommeded as type annotations, as recently established with the addition of the typing module.

The screenshot in OP looks weird to me because of the ({ opening a "comment" scope that is interrupted by Foo[] (which itself has meaning outside of that comment) and is then continued until closed by }) (followed by some lambda). Visually, I don't have issues with alpha and beta within Foo[], unless they have a meaning. Since the comma is not scoped, I assume that Foo[a, β] is equivalent to Foo[,], but I don't know for sure.

Genrally I belive that having too much comment scopes inline is visually disturbing and should be avoided, unless they are actual comments and inserted by the programmer.

djspiewak commented 7 years ago

First off, the IDEs are not the gold standard for how Scala code should be highlighted. Neither of them highlight lambdas, either; does that mean we shouldn't? Neither of them make a distinction between class and "class you're inheriting from", but we do. Neither of them have semantic highlighting for pattern matches. And in fact, IntelliJ doesn't even use the same compiler as SBT or Eclipse! (a fact which has, at varying times, caused significant incompatibilities)

A better benchmark for whether or not (and how) something should be highlighted is a) what the construct means, and b) how it is used. Type lambdas – like comments – are code which is basically elided by the compiler. For example:

({ type λ[α, β] = EitherT[IO, α, β] })#λ

In the above, only the EitherT[IO, , ] is retained by the compiler. Everything else is ultimately discarded. You can see this more clearly with kind-projector, which is a compiler plugin that provides first-class syntax for this construct:

EitherT[IO, ?, ?]

This is equivalent to the type lambda, and is in fact much closer to what the compiler sees when it processes the first snippet. Thus, the construct means "just the inner partial-application of the type; elide the rest", and the construct is used as boilerplate to achieve that functionality.

So the comment scoping is justified, I feel. The only knock against it is the fact that I had to special-case it for the Greek characters, rather than identifying the construct in its more general form.

From a UX perspective, that can create confusion, as it's not discernable for an average user why some of the code is now coming up grey.

If the user is putting a type lambda into their code, they know why it is grayed out because they understand the construct. Type lambdas are a rather advanced feature of Scala that only a small minority of users actually touch, and they're not something that you would use by mistake while trying to solve something else or without understanding. This scoping only affects people who already understand that the compiler is eliding that code, and who have already trained their eyes to try to look past the boilerplate. All this scoping does is make it easier for those people to look past the boilerplate.

Haskell has a pretty good type inference system but annotations are highlighted normally.

And the same with Scala. Type lambdas are different than type annotations. Haskell doesn't have an analogous construct mostly because it doesn't need one, but what is being accomplished functionally is a partial application of a type constructor. Thus, ({ type λ[α, β] = EitherT[IO, α, β] })#λ in Scala is equivalent to Haskell's EitherT IO. So you see where the comment scoping comes from.

Since the comma is not scoped, I assume that Foo[a, β] is equivalent to Foo[,], but I don't know for sure.

It's actually equivalent to Foo[?, ?] (with the kind projector plugin), which is pretty close.

Genrally I belive that having too much comment scopes inline is visually disturbing and should be avoided, unless they are actual comments and inserted by the programmer.

In general, yes, but type lambdas are (again) not a common feature. And when you do happen to have code which is using them extensively, getting any help whatsoever in trimming down the visual noise is vastly appreciated.

jpz commented 7 years ago

First off, the IDEs are not the gold standard for how Scala code should be highlighted.

No, I would say the gold standard is more likely to be regarded as the formal grammar of the language.

And in fact, IntelliJ doesn't even use the same compiler as SBT or Eclipse! (a fact which has, at varying times, caused significant incompatibilities)

Yes I've seen it get very confused about Spray and the magnet pattern actually....

Neither of them highlight lambdas, either; does that mean we shouldn't?

Well, that might be a suggestion as to what might be regarded as correct! Unless we're contrarians of course! :bowtie:

If you ask my opinion - as a type lambda is simply a pattern of usage within the language and nothing specific to grammar - my opinion is, no we shouldn't give type lambda special treatment in the syntax colouring that is default shipped with SublimeText.

(It's still possible to publish a special-purpose syntax colouring plugin for Sublime, that goes beyond colouring just for syntax, and colours for these idiomatic implementation patterns.)

If the user is putting a type lambda into their code, they know why it is grayed out because they understand the construct.

Code is read more often than it is read. I can speak from first hand experience, I had no idea at all why this is specially coloured - and there is no way for one to understand why this would be coloured down to comment-grey, where other legal Scala syntax is coloured in accordance with rules of Scala syntax. Even after reading the grammar file, the only way to resolve why it was being coloured in this way was to ask here.

Even if I wrote type lambdas, I could easily be surprised to see them being coloured in a special-case manner by my editor, and wrestle to understand why this was occurring.

I had to special-case it for the Greek characters

Understood. If and only if something is arbitrarily named alpha, beta, lambda, does the special colouring kick in.

Type lambdas – like comments – are code which is basically elided by the compiler.

They are not elided in the same manner as comments - they are subject to compilation. They still need to agree, for instance, as they appear twice in the statement (on the LHS and RHS).

type lambdas are (again) not a common feature.

So why maintain a special colouring for them if they are not common, and one that only works if the developer happens to use special naming of tokens?

I find sporadic examples of type lambdas being described on blogs, etc - not written by Paul Chiusano - where there are instances of the pattern, but no use of alpha, beta, lambda tokens.

It strikes me as incorrect to encode this in syntax colouring, as it colouring the code in a special manner, outside the domain of Scala syntax. Perhaps it's worth asking Sublime maintainers what the requirements of the syntax highlighter is:

  1. to colour the formal syntax of the language, or
  2. to show emphasis and de-emphasis on the basis of idiomatic patterns of usage.

With strong respect to you, I don't mean to criticise - only to discuss a difference of opinion, and give honest feedback.

djspiewak commented 7 years ago

the formal grammar of the language

Scala doesn't have a formal grammar. That's part of the problem here. But above and beyond that, work has been done (in academia and industry) at various times over the decades on mechanically deriving syntax highlighting modes directly from formal grammars, and it turns out to be almost never what you want.

Part of the problem is obvious: the grammar of the language specifies nothing about what to do if the code is syntactically invalid, but editors deal primarily with partially-valid code by definition. The other part of the problem is less obvious, but still critical: grammars deal strictly with syntactic elements even when more semantic details can be inferred, since they serve the AST which will be processed by later phases.

So no, the formal grammar, even if it existed, would be a very poor gold standard for syntax highlighting.

Well, that might be a suggestion as to what might be regarded as correct!

Lambdas, not type lambdas. For example, IntelliJ provides no particular highlighting for the following expression:

x => x

I think it's desirable that we highlight this, and Sublime in fact does so.

a type lambda is simply a pattern of usage within the language and nothing specific to grammar

Sure, but so is something like extends Foo, and yet we give that special scoping. Symbolic types are also just a pattern of usage within the language, but they too have special scoping. Different types of numeric literals (e.g. hex vs decimal) may also be seen as a pattern of usage, since they both generate integers, and yet we also scope these distinctly.

The editor must go above and beyond what is specified by the raw syntax of the language, since the raw syntax is simply insufficient to specify the behavior that is desirable from a live editor.

Code is read more often than it is read. I can speak from first hand experience, I had no idea at all why this is specially coloured - and there is no way for one to understand why this would be coloured down to comment-grey, where other legal Scala syntax is coloured in accordance with rules of Scala syntax.

Did you encounter the construct in code, or in the syntax test? It's worth noting that the syntax test isn't even valid Scala (intentionally so), and so it isn't a reasonable representation of how you would see the construct in a real use.

They are not elided in the same manner as comments - they are subject to compilation.

So are comments, if you read the compiler.

So why maintain a special colouring for them if they are not common

Macro definitions are even less common, but we scope them specially.

I find sporadic examples of type lambdas being described on blogs, etc - not written by Paul Chiusano - where there are instances of the pattern, but no use of alpha, beta, lambda tokens.

Look at anything describing Scalaz before the conversion to kind-projector (which happened about a year and a half ago, iirc). Scalaz was easily the most prolific user of type lambdas, and it had a very strict convention of Greek characters. This convention was chosen to make type lambdas stand out as special weirdness, and it was mirrored by the majority of the "downstream" community. Once Scalaz converted over to kind-projector, much of the community also converted, meaning that the majority of the individuals (including myself) who used to adhere to the Greek convention are now simply using kind-projector rather than the type lambda syntax in any form.

For this reason, I'm a lot less attached to this syntax than I otherwise would be, since I simply don't use it anymore. But I still believe it is a valid application of scoping.

It strikes me as incorrect to encode this in syntax colouring, as it colouring the code in a special manner, outside the domain of Scala syntax.

If you define Scala syntax by strictly what is implemented by the Scala parser as written, then you also need to exclude XML literals, which are (partially) delegated to the SAX2 parser. You also cannot provide special highlighting for _. There are probably other examples, but those are the ones that spring to mind.

tl;dr: Syntax highlighting modes must always go beyond what the language specifies in its strictly syntactic domain. Weak semantics and convention must play a role.