greghendershott / racket-mode

Emacs major and minor modes for Racket: edit, REPL, check-syntax, debug, profile, packages, and more.
https://www.racket-mode.com/
GNU General Public License v3.0
683 stars 93 forks source link

Indentation within at-expressions #152

Open greghendershott opened 9 years ago

greghendershott commented 9 years ago

Question from nha_ on #racket: How to avoid racket-mode indentation messing up at-expressions where the content isn't Lisp code or plain text. For example C code.

Below are some quick notes. At the moment this is more like a wiki entry and brain dump, than an actionable issue.


It seems the best way to deal with this is to use mmm-mode. It allows a buffer to use other major modes for certain regions. (There are other Emacs modes that offer to do this, but mmm-mode is currently maintained and on MELPA.)

For example in this file, first M-x mmm-mode. Then select the region inside the curly brackets of the at-expression itself:

@foo{
     if (i < 0)
       x;
     else
       {
         x;
         y;
       }
       }

Then C-c % C-r, type c-mode and RET.

Now that region will be managed by c-mode for indentation, font-lock, and so on.


Is it tedious to mark these regions manually, every time you edit a buffer? Of course. mmm-mode supports defining classes that can look for regions in a buffer that should use another mode, and do this automatically. It provides some predefined such classes like for JS within HTML. Of course it provides no predefined class for "c-mode within at-expressions".

How to make one? The "easy" class definitions use a pair of regexps for the begin and end tags. I don't think that will work for matching at-expressions. Example why: The closing regexp couldn't be }, because that could be a brace in the C code, not the one closing the at-expression. Instead, I think such a class would need to use the handler option that takes full control of the search. Such a handler could probably use an Emacs regexp like "@[^ {]+{\\(.\\|\n\\)+}". Because the middle portion, \\(.\\|\n\\)+, is greedy, it matches through all { and } pairs within the C code itself, up to but not including the } closing the at-expr. However it's fragile and would break if a curly brace were within a C comment, for example.


Maybe racket-mode could provide a make-mmm-mode-at-expression-class function, that defines a mmm-mode class for a specific major mode inside an at-expression.

Great, but which major mode to be used? At-expresions aren't "tagged" with some ID about the contents.

At best, maybe some file-local variable could say which mode to use for at-expressions. That would be OK for the case where it's the same mode for the entire file. But mixing more than one sub-mode in the same file... I don't know.

And that's where I'm leaving this for now.

NHALX commented 9 years ago

mm-add-classes also lets you specify functions, instead of regexp, for the front/back parameters.

As you mentioned, the block reader needs to be aware of the language context inside the at-exp to properly handle terminators hiding in comment blocks. The elisp function parse-partial-sexp handles comments and looks like it could work:

"The syntax table controls the interpretation of characters, so these functions can be used for Lisp expressions when in Lisp mode and for C expressions when in C mode. " (ftp://ftp.gnu.org/old-gnu/Manuals/elisp-manual-20-2.5/html_node/elisp_566.html)

At-exp could possibly be extended to include optional meta info describing the block contents. @C:foo{ ... }

greghendershott commented 9 years ago

Thanks for thinking about this more and following up.

mm-add-classes also lets you specify functions, instead of regexp, for the front/back parameters.

I noticed that, I just doubt an independent back matcher function could work reliably. I think it would need to be the "handler" option they mention, that parses the whole thing, because of languages that use }.

As you mentioned, the block reader needs to be aware of the language context inside the at-exp to properly handle terminators hiding in comment blocks. The elisp function parse-partial-sexp handles comments and looks like it could work:

"The syntax table controls the interpretation of characters, so these functions can be used for Lisp expressions when in Lisp mode and for C expressions when in C mode. " (ftp://ftp.gnu.org/old-gnu/Manuals/elisp-manual-20-2.5/html_node/elisp_566.html)

Yes -- good point! Once we know the major mode, we can use its syntax table to ignore { and } chars within both comments and strings. Good.

Of course, that requires knowing which major mode...

At-exp could possibly be extended to include optional meta info describing the block contents. @C:foo{ ... }

Yes, something like that is the only idea I have right now.

It would probably be more helpful for the "language tag" to be the name of the Emacs mode, e.g. c-mode instead of C. That might make it simpler for us to make one mmm "class" extension that can handle all modes, since the mode name is right there.

NHALX commented 9 years ago

I noticed that, I just doubt an independent back matcher function could work reliably. I think it would need to be the "handler" option they mention, that parses the whole thing, because of languages that use }.

Hmm, it seems to work ok. It handles the nested if/else block @ 1 and the commented groups of } } } @ 2 but something odd is happening at the end of the "set-wrap" block near the top of the img.

http://i.imgur.com/0lAQY2L.png

Keep in mind this is tested on the overloaded racket reader that converts top-level { } blocks to @begin/text{... \n}

;; .emacs: mmm-mode
(require 'mmm-mode)
(setq mmm-global-mode 'maybe)

;; uninteresting bit
(defun with-mode (new-mode f)
    (let ((reset 
           (buffer-local-value 'major-mode (current-buffer))))
      (funcall new-mode)
      (funcall f)
      (funcall reset)))

;; hack to import c-mode-syntax-table - what is the right way?
(with-mode 'c-mode (lambda ())) 

;; relevant part 
(defun jmp:inner->out (stx-table limit)

  (setq parse-sexp-ignore-comments nil) ; needed?

  (with-syntax-table stx-table
    (parse-partial-sexp (point) limit -1 
                        nil nil nil)))

(mmm-add-classes
      '((cspl15
         :submode c-mode
         :face mmm-declaration-submode-face
         :front "{"
         :back (lambda (limit)
                 (jmp:inner->out c-mode-syntax-table limit)
                 ;; set match data
                 (looking-at "")))))

(mmm-add-mode-ext-class 'racket-mode ".\\.cc\\.rkt" 'cspl15)
greghendershott commented 11 months ago

Now that racket-hash-lang-mode is merged: In that case we defer to the language for indent.

Now another approach is possible. In something like:

#lang scribble/manual
@codeblock{
#lang rhombus
fun fib (n):
  cond
  | n == 0: 1
  | n == 1: 0
  | ~else: fib(n-1) + fib(n-2)
}

You could imagine that the drracket:indentation for scribble/manual could defer to that for rhombus within the codeblock.

Currently that doesn't happen. This is just a hand-wavy idea for a possible direction.

(And if that could/did work, you could also imagine the at-exp meta language could do similar. Maybe that's more complicated because meta language.)