Kiiyya commented 1 year ago

Improved Semantic Highlighting RFC

The current semantic highlighting doesn't do much:

"Properties" x.y are just based a syntactic rule (line 443).
We look up the TermInfo node in the InfoTree and check whether identifiers are variables, which get the variable semantic token type.
Highlight tactics as keywords.

And that's about it.

But Lean is uniquely equipped to do some amazing semantic highlighting, as all the information is already in the InfoTree, and you can go into MetaM using the same context that terms have been elaborated with. After a bit of playing around, I have arrived at this (very unpolished) proof of concept, just to give an idea of what is possible:

I had already created a thread on Zulip, and while it got lots of positive reactions, I didn't get a clear answer on how to proceed, so I hope I'm not doing anything wrong by opening an RFC. The contribution guide says an RFC is required before a PR. This would be my first contribution to Lean itself.

Considerations

Which tokens should get which token type and which token modifiers? Which of the following features should be implemented or not, and which are the priorities?

Four Major Kinds of Tokens

I am not sure which kinds of tokens are the most useful from a UX perspective to distinguish, but so far I have settled on a distinction between types and propositions, and then values vs proofs. This results in four major token kinds, which are determined using Meta.inferType twice:

expr : ... -> Type _ are types (.type token modifier).
expr : ... -> Prop are predicates (.prop).
expr : ... : ... -> Type are values (.value).
expr : ... : ... -> Prop are theorems (.proof).

Namespaces and Dot Notation

Currently, Lean.Expr gets highlighted as one token. I think it would be better if Lean. was highlighted as a namespace, and Expr as type. If the "namespace" is actually a type, for example in Person.name, then Person should get highlighted as a type, and .name as the function.

I think that given p : Person, p.name should have name highlighted as a field, not a function. But Person.name should be highlighted as a (projection) function, which it is. Implementing this would be non-trivial, probably requiring a fold over the x1, x2, ..., xn in x1.x2. ... .xn. It's probably not worth the effort, but would be cool.

We often write myList.length. In this case, should .length be highlighted as a function, since List.length : List α -> Nat, or as a non-function, since List.length xs : Nat? I think the latter makes more sense, but would be (much) harder to implement.

Tactics

How should tactics get highlighted?

Punctuation

So far only identifiers get highlighted, but Lean is quite notation-heavy. How should the ∀ and , in ∀x, 1 = 1 get highlighted? What about ->, =>, etc? What about the . in (. + 2) vs the . in (Vec . 3)?

Custom Notation and User Extensibility

Existing notation for some types such as =, ∀, ∃, × etc. do not get highlighted at all currently. I would expect them to get highlighted the same color as Eq, Prod, etc. I suppose it would be possible to simply hardcode sufficiently many special cases, but that feels wrong.

I think it could be useful to be able to highlight tokens based on nearly arbitrary (user-defineable?) rules. This would address the custom notation problem, as well as allow the following: For example, while working on a proof about real numbers, it might be useful to highlight variables x for which x < 0 exists in the context. Or perhaps a type A occurs frequently in our theory and means something important, and we want to highlight that, for example highlighting all tasks x : Task Nat. I imagine this would come in handy when screenshotting code for a presentation or giving a demo, where you want to highlight certain things.

I have not done any work on this front, and the extra complexity might not be worth it, or at least not currently.

How to handle `abbrev`?

What should abbrev Vec3f : Type := Vec Nat 3 be highlighted as? I would say the same color as Vec, but without the .func modifier. What should abbrev Vec3 (α : Type) : Type := Vec α 3 be highlighted as? What should abbrev Vec3 {α : Type} : Type := Vec α 3 be highlighted as?

Universes?

Currently Type and 0 in Type 0 get highlighted differently. I think it would make sense to highlight both and max, imax with the same token kind.

Numbers, Strings

It might be possible, due to OfNat or coercion or notation, to have 0 refer to a function or a type instead of the value. In that case, 0 should get highlighted accordingly.

Strings currently are only syntactically highlighted using textmate scopes, which causes the . in s!"hello {user.name}" to be highlighted as a string. Semantic token highlight has higher priority than textmate scopes, so we would probably need to make sure that every character inside {..} gets a semantic token. I have zero experience with textmate scopes, so I'm not sure if it can be fixed there.

Fork Default VSCode Theme?

The experience with existing themes should be "good enough", re-using existing LSP token types and modifiers. However, Lean is just different enough that we quickly hit limits. There will be new token types and new token modifiers.

I think the ideal solution is to offer a fork of the VSCode Dark+ and the VSCode Light+ themes and fine-tune them for Lean. This way, when e.g. giving a tutorial on Lean, you can point new users to the theme, which is a simple one-click install, and which can serve as a starting point for users wishing to customize it.

Alternatively, it would be rather easy to create a small token customization generator where users can just tweak some few base colors to make it fit into their existing theme.

The Implementation so far

So far, the declaration kind (whether it's a definition, constructor, inductive type, theorem, recursor, etc.) determines the token type, and its role (whether it's a type, predicate, proof, value, etc.) determines the token modifiers. If we want to re-use existing token types and modifiers as much as possible, it might make sense to introduce a simple internal-only token kinds (?) enum, and then let tokens have a set of these token kinds, which then get mapped to LSP token types and LSP token modifiers. Just for the sake of code cleanliness, as I've found it be a little awkward having to decide whether something is a token type (there can be only one per token!) or a token modifier.

Handling of auxDecls

When we encounter an fvar that is an auxDecl, we check the Environment (of the next command) for a constant named currentNamespace ++ fvar.userName. This feels very hacky, I wonder whether there's a better way available.

How LSP Semantic Tokens work

The LSP allows text ranges to have exactly one token type, and any amount of token modifiers, written tokentype.mod₁.mod₂.mod₃, which can be easily customized in your VSCode settings.json. Token modifiers are encoded as a bit-field, which presumably implicitly imposes some limit (32 modifiers? 64?). For example, variable.declaration.proof. It is possible and easy to introduce new token types and modifiers, no need to touch the VSCode extension. Just in case some readers don't know.

TODOs

[ ] Highlighting tactics. Maybe some tactics should have a different color than others?
[ ] Re-use existing LSP token types and modifiers.
[ ] Create Lean-tuned fork of VSCode Dark+ and Light+ themes.
[ ] Functions defined via where seem to break currently.
[ ] Add some option modifier to name in set_option name. Similar for import X, open X.

Julian commented 1 year ago

(This is awesome to discuss as far as I'm personally concerned, thanks for pushing it forward)

Will throw in an additional consideration which probably seems related -- perhaps some semantic highlighting of the checkpoint tactic is useful? There's this old Zulip thread that I was keeping an eye on with a comment from Leo saying:

In the future, it would be great to have UI support to mark the part we are working on.

Perhaps signaling such a thing as part of the semantic highlighting (by e.g. differentiating between tactics which are above the checkpoint) is also something worth thinking about.

To throw out another brainstorming idea, perhaps some Lean API that allows someone writing a tactic (or some other more general piece of syntax) a way of tweaking how it will be semantically highlighted will be useful as well.

Kha commented 1 year ago

Hey Max, just wanted to say we're aware of the RFC and appreciate your work! I believe we definitely want to land some version of this. Right now we're not actively working on the LSP and don't have the bandwidth to make design decisions on all those important points you mentioned above, but I hope we will be able to get back to this soon.

Kiiyya commented 1 year ago

I'm happy about the positive reception! I understand there is limited capacity, and am a little busy recently as well. But I am happy to work on this (and potentially other Lean tooling) eventually :).

I think the biggest decision to be made here is whether we want highlighting to be user-extensible. If yes, then all the above considerations just become a matter of converging to some sensible defaults. I refer to this Zulip comment to further highlight (pun intended) the usefulness of user-extensible highlight rules. And I haven't looked into highlighting in the infoview at all yet.

Admittedly I do not fully understand Julian's remarks, as I haven't used checkpoint before.

Kha commented 1 year ago

In the future, it would be great to have UI support to mark the part we are working on.

I mean, the UI already exists - that's what the margin bar is for.

But I am happy to work on this (and potentially other Lean tooling) eventually :).

Awesome!

I think the biggest decision to be made here is whether we want highlighting to be user-extensible. If yes, then all the above considerations just become a matter of converging to some sensible defaults.

Yes, I consider this RFC to be about the default style. Extensibility is good but could come later. The need to register new kinds in both the server and client/theme makes it a bit more onerous than other extension points.

leanprover / lean4

RFC: Improved Semantic Highlighting #2305

Improved Semantic Highlighting RFC

Considerations

Four Major Kinds of Tokens

Namespaces and Dot Notation

Tactics

Punctuation

Custom Notation and User Extensibility

How to handle `abbrev`?

Universes?

Numbers, Strings

Fork Default VSCode Theme?

The Implementation so far

Handling of auxDecls

How LSP Semantic Tokens work

TODOs

leanprover / lean4

RFC: Improved Semantic Highlighting #2305

Improved Semantic Highlighting RFC

Considerations

Four Major Kinds of Tokens

Namespaces and Dot Notation

Tactics

Punctuation

Custom Notation and User Extensibility

How to handle abbrev?

Universes?

Numbers, Strings

Fork Default VSCode Theme?

The Implementation so far

Handling of auxDecls

How LSP Semantic Tokens work

TODOs

How to handle `abbrev`?