add an extension point to run per-lang setups

greghendershott / racket-mode

Emacs major and minor modes for Racket: edit, REPL, check-syntax, debug, profile, packages, and more.

https://www.racket-mode.com/

GNU General Public License v3.0

682 stars 93 forks source link

add an extension point to run per-lang setups #661

Closed capfredf closed 1 year ago

capfredf commented 1 year ago

It works, but I think @greghendershott should have a better idea of how the code is organized.

Notes on the implementation:

A lang name is extracted textually in the Racket backend. I feel this piece of information should be provided by Drracket or some related library eventually.

Also, It looks to me that a per-language configuration is quite desirable.

greghendershott commented 1 year ago

Thanks for looking into this!

Although I haven't thought about it recently (the hash-lang branch has been around awhile!!) I remember having some questions about integration with Emacs overall. When would it be smart to follow "the Emacs way", or not?

One example is your use case: Files that don't use .rkt but you want to use racket-hash-lang-mode. Like .scrbl or .rhm. And maybe some additional Emacs config for each.

So typically the Emacs way is to use different major modes, and choose one using one of the ways described here. Like map each extension to an Emacs major mode, in auto-mode-alist. Or put ;; -*- mode: blah -*- on the first line (where blah means blah-mode). [There's even a way, with magic-mode-alist, to provide a function to look at the file contents and return the mode; it could look textually for #lang, IIUC?]

Also it's easy to define a simple little derived major mode, which can include things like (setq-local comment-start ___).

So I wonder if that's the way to go, here? i.e.:

Go ahead and define-derived-mode a scribble-mode and rhombus-mode -- but maybe since they already exist use fresh names like racket-hash-lang-scribble-mode and racket-hash-lang-rhombus-mode.
In the docs, mention these as predefined conveniences, as well as the general techniques people can do themselves (links above).

This would mean that the Racket back end isn't driving determination of "the lang" in Emacs. Instead the file extension is, or the file contents, is driving the choice of a major mode, in the usual Emacs way. The question is whether that's good or bad.

Thoughts?

capfredf commented 1 year ago

Hi Greg,

Thanks for your spot-on comments.

Since currently Racket does not provide enough syntactical properties (e.g. indentation rules, highlighted keywords, and tokens for comments), I agree that the Emacs way is a bit more desirable, and using file extensions to choose language mode would be simpler and more emacs-y than by extracting what is after #lang .

Ideally, I would like the elisp part of the Racket-mode to be a thin client and the Racket part to be a rich server that spits out everything. Apparently, we are not there yet.

I will follow the Emacs way and make a new scribble-mode using define-derived-mode and some code ported from the existing scribble-mode.

greghendershott commented 1 year ago

Looking at this at again, I'm coming around to your original suggestion, mostly.

I do think it's fine to rely on Emacs auto-mode-alist as the mechanism to say, "Given some file extension, what Emacs major mode to use?" So for example all of .rkt, .scrbl, .rhm, etc. can use racket-mode, and, the user racket-mode-hook can enable the racket-hash-lang-mode minor mode. All these extensions are handled the same way with the hash langs.
Things like comment-start seem like a hole in the lang info spec. Whether it's DrRacket or Emacs or vscode, any editor that wants to offer comment/uncomment commands needs to know this. So stuff like this seems like something where Robby and Matthew and I could/should coordindate to add an info key.
Having said that, there may be miscellaneous config that a user wants to do based on the module language. Like in your example, you want M-q to fill-paragraph (not reindent) when in scribble lang. In that vein:
- It turns out that a language's "info" function can support a 'module-language key, as discussed for the #:info option of syntax/module-reader. (It's possible that older langs might not use this, but maybe the best answer there is we submit PRs to update those?) We don't need to re-implement read-language with regexps.
- I could define an Emacs hook, say racket-hash-lang-module-language, which is called with the mod lang value whenever that changes (from loading a file or from user editing). Users can add hook functions. This could be the point to do stuff like tweak M-q.
So this could handle "all other" config. Even in cases where we think adding a new lang info key is the ultimate Right Way, this could help in the meantime.

Any thoughts?

I've been sketching this out and doing some initial testing, so I'm not asking you to update your PR.

greghendershott commented 1 year ago

I pushed a commit to the hash-lang branch. When you have a chance, let me know if it seems OK?

greghendershott commented 1 year ago

I pushed commit 2075184 which moves some stuff down to the back end with a view toward adding a new info key for langs to supply.

capfredf commented 1 year ago

@greghendershott Thank you. I will give it a shot and get back to you in a week or so.

greghendershott commented 1 year ago

This isn't a nudge; on the contrary it's a summary for when you do have time to catch up:

A lang can now supply a drracket:comment-delimiters info key. racket-hash-lang-mode will use this to set comment-xxx variables.

https://github.com/racket/drracket/issues/634 tracks the progress. But even now, before my PRs for scribble and rhombus are merged to supply this, the Racket Mode back end supplies fallbacks for those.
A new command racket-mode-C-M-q-dwim is bound to C-M-q by default. Based on the lang lexer's token under point, it does a prog-indent-sexp or fill-paragraph or fill-comment.

This has worked well for me so far editing a .scrbl file -- it fills in text section, but indents in racketblock code examples. But let me know of any problems/omissions.
Although the previous points address your configuration motivation (IIUC), there is also the new racket-hash-lang-module-language-hook for other configuration.

Finally I think I might go ahead and merge the hash-lang branch by the end of this week. (I might slap an "experimental" caveat in the docs. But these days it's probably better for this to live on the main branch, to get more use and improvement.)

capfredf commented 1 year ago

I haven't tried out the latest change yet. But out of curiosity, what do you have in mind when it comes to font locking ?

On Tue, Sep 5, 2023, 9:04 AM Greg Hendershott @.***> wrote:

This isn't a nudge; on the contrary it's a summary for when you do have time to catch up:

1.

A lang can now supply a drracket:comment-delimiters info key. racket-hash-lang-mode will use this to set comment-xxx variables.

racket/drracket#634 https://github.com/racket/drracket/issues/634 tracks the progress. But even now, before my PRs for scribble and rhombus are merged to supply this, the Racket Mode back end supplies fallbacks for those. 2.

A new command racket-mode-C-M-q-dwim is bound to C-M-q by default. Based on the lang lexer's token under point, it does a prog-indent-sexp or fill-paragraph or fill-comment.

This has worked well for me so far editing a .scrbl file -- it fills in text section, but indents in racketblock code examples. But let me know of any problems/omissions. 3.

Although the previous points address your configuration motivation (IIUC), there is also the new racket-hash-lang-module-language-hook for other configuration.

Finally I think I might go ahead and merge the hash-lang branch by the end of this week. (I might slap an "experimental" caveat in the docs. But these days it's probably better for this to live on the main branch, to get more use and improvement.)

— Reply to this email directly, view it on GitHub https://github.com/greghendershott/racket-mode/pull/661#issuecomment-1706582512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5HCVLI4B722O6NHQHFITXY4PN7ANCNFSM6AAAAAAYUGAK4M . You are receiving this because you authored the thread.Message ID: @.***>

greghendershott commented 1 year ago

That's an open ended question. :smile: A couple answers:

Apparently the set of tokens a lexer may return is open.
- If a relatively popular lang like rhombus or scribble adds a new token type, then I'll probably update this to look for that and map it to a specific face. e.g. I did this for rhombus "at" and "operator" tokens.
- At the same time, there should probably be a config var alist for users to add/override those choices. That's still TO-DO.
Overall, the font-lock is rather "plain" compared to classic racket-mode. Much like DrRacket. e.g There aren't regexp rules to highlight popular functions, or variable names in let or define, etc. Of course that also means there aren't buggy corner cases, because regexps. My current thinking is:
- The lang lexer and racket-hash-lang-mode is about "syntactic" highlighting. By design that's somewhat basic... but also guaranteed to be correct.
- Sometimes people refer to "semantic" highlighting. This is much of the value-add from the "gaudy" regexp rules in classic Racket Mode. But here something like racket-xp-mode, based on check-syntax analysis, could do a good, and more-correct job. e.g. Highlight everything that's a variable. Or give font-lock-keyword-face to things imported from certain modules like racket/base. etc. I think that would give back much of the classic variety, but again with fewer regexp gotchas? Probably, but TBD.
And in fact this is another reason I'd like to merge to racket-hash-lang-mode. My other long-running project is a check-syntax db, "pdb". With those on separate branches, it's awkward to experiment with this mix of lexer highlighting and semantic highlighting.

That's my little brain dump. If you were actually asking some other, third question, please let me know. :smile:

greghendershott commented 1 year ago

New commit f314ae9 has racket-xp-mode contribute faces to text not already fontified by racket-hash-lang-mode -- specifically identifiers at binding definition and use sites. This gives a more colorful presentation (if desired), closer to "classic" racket-mode than to Dr Racket.

Although I'm not 100.0% sure about all the details, this feels like the right basic approach.

On the Emacs side: It goes with the grain of Emacs modes -- a buffer has a major mode, optionally enhanced by one or more minor modes. A basic unit of user preference is the mode; changing the major mode for a buffer, and enabling/disabling minor modes on top that. Furthermore there are customizable faces, and a customizable map of token types to faces.
On the Racket side: It corresponds to the division of labor between syntax/color-lexer for basic token coloring and drracket/check-syntax for "semantic" highlighting.