racket / drracket

DrRacket, IDE for Racket
http://www.racket-lang.org/
Other
444 stars 93 forks source link

Proposal: New info key `drracket:comments` #634

Open greghendershott opened 10 months ago

greghendershott commented 10 months ago

Things like DrRacket and Emacs offer commands to comment and un-comment a region.

Non-s-expression langs should be able to supply information for these commands, via a new info key named something like "drracket:comments".

Open to improvements, but initially I'll suggest the value is a list:

(list comment-start comment-end comment-padding)

where those three values as identical to three configuration variables in Emacs:

When a comment command makes a new comment, it inserts the value of ‘comment-start’ as an opening comment delimiter. It also inserts the value of ‘comment-end’ after point, as a closing comment delimiter. For example, in Lisp mode, ‘comment-start’ is ‘";"’ and ‘comment-end’ is ‘""’ (the empty string). In C mode, ‘comment-start’ is ‘"/ "’ and ‘comment-end’ is ‘" /"’.

The variable ‘comment-padding’ specifies a string that the commenting commands should insert between the comment delimiter(s) and the comment text. The default, ‘" "’, specifies a single space. Alternatively, the value can be a number, which specifies that number of spaces, or ‘nil’, which means no spaces at all.

Examples:

I'd suggest that when the key isn't supplied, it's OK to default to s-expression as the default. (Justification: There exist many more s-expression langs.)


Whatever the final form, I think the to-do list would be:

If agreed/confirmed, I would probably be able to make PRs easily for all but the last. (I'm not familiar with that code. I'm not sure about details like whether a "Racket" menu is where these should continue to live.)


Background: https://github.com/greghendershott/racket-mode/pull/661#issuecomment-1693769408

sorawee commented 10 months ago

In C, I expect the configuration to be ("/*" "*/" ""). Now, If I make a selection that spans multiple lines and hit comment, it probably should wrap /*and */ around the selection.

But in Python, if I make a selection that spans multiple lines and hit comment, it probably should add # on every line.

How do we distinguish these two scenarios?

rfindler commented 10 months ago

I like where this is heading and am happy to help out with DrRacket's comment/uncomment functionality (or whatever else seems useful!).

Some thoughts:

greghendershott commented 10 months ago

@sorawee

But in Rhombus, if I make a selection that spans multiple lines and hit comment, it probably should add # on every line.

I don't understand what you mean here? This claims that Rhombus has C-style comments?

@rfindler I think the Emacs attitude is that multi-line comments are handled as just a series of lines with single-line comment delimiters... but I'm not entirely sure.

However wrt the tool-agnostic proposal: I think it's fine to have a drracket:comments value include both "line-oriented" and (if different) "region-oriented" comment delimiters. And then it's up to the tool, plus whatever user choice config that tool enables. So e.g. some tools might do what (I imagine) Emacs does, and treat a region as just as series of individually commented lines -- whereas other tools might try to do something fancier for region comments... and this might be user configurable to support any personal/project/company coding preference/standard.

TL;DR maybe the value should be ~=

(list comment-start comment-end comment-spacing)
;; or
(list (list line-comment-start   line-comment-end   line-comment-spacing)
      (list region-comment-start region-comment-end region-comment-spacing))

Does that sound better?

sorawee commented 10 months ago

Sorry, I edited to Python precisely because I thought you might have this response :)

You can substitute it with any language that doesn't have block comment.

greghendershott commented 10 months ago

Oh no worries, I just genuinely didn't understand your point. :)

rfindler commented 10 months ago

@greghendershott What is the UI for Emacs when entering a comment? Does it get a region as argument and then break it up into lines and put these delimiters around each line?

greghendershott commented 10 months ago

@rfindler The Emacs UX varies depending on customization variables like comment-multi-line as well as whether you manually vs. auto-fill. (Mostly, I just use a comment-dwim command on the selected text to add/remove comments. Sometimes refill manually. But there are more options I don't use.)

I think the proposal's focus is: What info should a lang supply about comments?

I'm not a lang designer or historian, but, it seems like there are three possible kinds of comments:

  1. Line comments. End delimiter is newline.
  2. Region comments. Different start/end delimiters. Can span newlines.
  3. Expression comments (like Racket's #;).

Does that seem about right? Is that the minimal info a lang would need to convey?


p.s. When a lang supports both line and region comments, how to handle multi-line comments is user preference/style. (IIUC, orthogonal to this proposal.)

C++ examples I've seen all of... ``` // Line 1 // Line 2 ``` ``` /* Line 1 */ /* Line 2 */ ``` ``` /* Line 1 Line 2 */ ``` ``` /* Line 1 * Line 2 */ ```
samth commented 10 months ago

The last of those C++ examples could not be automatically generated from a preference and the spec you've described, and thus would have to be custom for a particular language, but that's probably ok.

rfindler commented 10 months ago

@greghendershott I agree generally with what you write but with one concern, namely that we should have some idea of what we're going to support so we can make sure that we can actually implement the UI that we want on the DrRacket side. It sounds like the Emacs side already has some support for taking some specific pieces of information and turning them into various UIs and I'm more than happy to go along with that (i.e., to make DrRacket follow Emacs's lead) but it seems important to make sure that we actually know what we're going to be getting so we don't wish we'd done something slightly differently somewhere along the way. So (just to make sure we're on the same page), I think I'm more in the "make sure I know what we're getting into" mode than "I want something different" mode, if that makes sense.


Right now, DrRacket has three commenty operations in the "Racket" menu:

Is that the plan? I guess the big question with that plan is whether or not we want to support comment/uncommenting via region comments where we don't do things line by line but instead put an open comment in one place and close comment in another place, possibly far away from each other.

I'm not sure we need special IDE support for #;-style comments but maybe I'm missing something there? Opinions welcome. :)

greghendershott commented 10 months ago

On the one hand: I'm inclined to say it's sufficient for the info key to supply all information only knowable by the lang. The end.

On the other hand: I agree it's worth thinking through some examples of how tools would actually use the information. If I sounded dismissive, sorry, it's only because...

On the third hand: A lot of the Emacs commenting code seems to be accreted over decades; I would love to avoid learning too much about it, if setting a few comment-xxx vars seems to suffice. :smile:


Setting aside the DrRacket comment box:

And... I think that's it?

I mean, if a lang has some notion of region comments, for the sake of completeness I think the info key should allow communicating that. But that doesn't mean all tools must use all that information to do all possible things.

So I don't think the info key per se would obligate DrRacket to add some new region commenting features. It would just make it possible to do so, someday/maybe... or never. If users haven't been clamoring for this, then it's probably not worth doing.

(Subjective anecdata opinion warning: Over the years, the line-comment style has grown more popular than region-comments. The latter seem used just for a piece of a line, in langs that lack expression comments?)


Having said all that, I wish we had a representative of e.g. vscode to speak up here, too.

sorawee commented 10 months ago

I think most (modern) IDEs/editors do not have comment and uncomment operations separately. Instead, there's only one operation, and the IDEs/editors automatically decide which mode it should be, based on whether the region is already commented.

rfindler commented 10 months ago

I see that Sublime Text does something like that, always looking at entire lines. So if you have a line that's partially commented out and hit the button, it becomes fully commented out. And if you select some region that includes multiple lines but only parts of the first and last line, then it comments/uncomments the entire lines. (Hope that this make sense -- it seems like the most sensible operations if you start from the premise that you're working with entire lines.) This was with the "toggle comment" menu item when in a python file. I see there is also a toggle-block-comment menu item, however.

greghendershott commented 10 months ago

The last of those C++ examples could not be automatically generated from a preference and the spec you've described, and thus would have to be custom for a particular language, but that's probably ok.

It looks like Emacs defines a fourth value, "comment-continue", for this.

By the way, another example I forgot:

/* Line 1 */
/* Line 2 */

Indeed this seems to be the default for Emacs c-mode (not c++-mode, which uses //).

greghendershott commented 10 months ago

This morning I started digging into Emacs' newcomment.el, about 1500 lines of Emacs Lisp. Not as crufty as I imagined. But non-trivial; reflecting the union of many langs and user preferences.

I'm still digesting. Among other things, trying to sort through which aspects are clearly lang-specified vs. clearly per-user preference/customization. And maybe some aspects that straddle both?


One quick observation: There are a few comment-xxx-function variables, too -- e.g. the comment-region command will call comment-region-function if supplied, else do its "normal" thing using comment-start and other string vars.

As an ultimate "escape" for scenarios and langs where a simple data-driven specification is insufficient, you can call a function to do the thing.

This suggests we might want a lang info key to be able to supply callable functions, as well.

rfindler commented 10 months ago

I feel pretty comfortable with the line-based approach and am happy to go ahead if everyone else is.

I'm a bit nervous about adding more callable functions into the lang-based configuration. From the Emacs perspective, one has to keep a connection alive so the state of the function is always there which may be a pain for GC reasons (or maybe it's fine?); from the DrRacket perspective, it is one more place where malicious code can cause trouble for the people (although there are a lot of these already, alas). Maybe I shouldn't worry about this that much....

greghendershott commented 10 months ago
  1. I'd be fine with naming the info key something like drracket:comment-delimiters, and keeping the value data-driven.

    After all, if someday it turns out some lang actually needs a function-driven approach:

    • We could add a new info key for that, later.

    • There's always a fallback via tool configuration. When a lang supports the module-language info key, and a tool reflects that up to users for use in configuration (as I'm doing): A user can config their tool so that when the mod lang is foo-lang, they set special Emacs Lisp comment-xxx-function values... or use some DrRacket script config... or whatever extension mechanism the tool provides.

    So I'm not too worried about the data-driven approach being simplistic. Even if that turns out to be the case (which I'm not even sure is likely), users can address it until/unless a new info key addresses it.

  2. As for the data-driven values for the drracket:comment-delimiters info key, I'd be fine with:

    • A list of three values, (list comment-start-string comment-end-string comment-padding-string).

    • If a tool offers fancier choices about comments (like various comment "styles", or choices about line- vs. region-oriented comments, or whatever) the user will need to configure that for their tool if they don't like the defaults for a lang. Again, the module-language info key's symbol value can be the conditional for configuration.

?

greghendershott commented 10 months ago

As an initial experiment I added this for scribble langs (IIUC they use at-exp-lib):

modified   pkgs/at-exp-lib/at-exp/lang/reader.rkt
@@ -13,30 +13,33 @@
 ;; Settings that apply just to the surface syntax:
 (define (scribble-base-reader-info)
   (lambda (key defval default)
     (define (try-dynamic-require lib export)
       (with-handlers ([exn:missing-module?
                        (λ (x) (default key defval))])
         (dynamic-require lib export)))
     (case key
       [(color-lexer)
        (try-dynamic-require 'syntax-color/scribble-lexer 'scribble-inside-lexer)]
       [(drracket:indentation)
        (try-dynamic-require 'scribble/private/indentation 'determine-spaces)]
       [(drracket:keystrokes)
        (try-dynamic-require 'scribble/private/indentation 'keystrokes)]
       [(drracket:default-extension) "scrbl"]
+      [(drracket:comment-delimiters)
+       '("@;" "" " ")]
       [else (default key defval)])))

"It works".

But note this:

modified   pkgs/at-exp-lib/scribble/base/reader.rkt
@@ -9,41 +9,43 @@
   (define-values (at-read at-read-syntax at-get-info)
     (make-meta-reader
      'at-exp
      "language path"
      lang-reader-module-paths
      wrap-reader
      (lambda (orig-read-syntax)
        (define read-syntax (wrap-reader orig-read-syntax))
        (lambda args
          (define stx (apply read-syntax args))
          (define old-prop (syntax-property stx 'module-language))
          (define new-prop `#(at-exp/lang/language-info get-language-info ,old-prop))
          (syntax-property stx 'module-language new-prop)))
      (lambda (proc)
        (lambda (key defval)
          (define (fallback) (if proc (proc key defval) defval))
          (define (try-dynamic-require lib export)
            (with-handlers ([exn:missing-module?
                             (λ (x) (fallback))])
              (dynamic-require lib export)))
          (case key
            [(color-lexer)
             (try-dynamic-require 'syntax-color/scribble-lexer 'scribble-lexer)]
            [(drracket:indentation)
             (try-dynamic-require 'scribble/private/indentation 'determine-spaces)]
            [(drracket:keystrokes)
             (try-dynamic-require 'scribble/private/indentation 'keystrokes)]
+           ;; Note: Do /not/ supply drracket:comment-delimiters here;
+           ;; that would cause the at-exp meta-lang to overrule the
+           ;; main lang.
            [else (fallback)])))))

In other words, we want something like #lang at-exp racket/base not to cause @; to replace ;;. I can add a note about this when I write the docs (~= "Don't supply this info key for meta languages").

(I suppose there exists some fancy spec to allow mixed comment styles for mixed lang sources, but it's kind of giving me a headache and I'm not sure how/if all tools could suport this.)

rfindler commented 10 months ago

I agree that a fancy spec for mixed comment styles seems like too much for now.

What do you think about allowing a list of lists of length three (meaning that there are multiple comment style options that a language can supply)?

Also, I'm happy with something more generically named, eg ide-comment-delimiters? Or maybe another day we could do a renaming pass on all these things which would be fine with me.

greghendershott commented 10 months ago

I agree that a fancy spec for mixed comment styles seems like too much for now.

Yeah I think comments would be just one part of an entire conversation/spec about handling mixed/meta langs. It's almost as if the lang "get-info" function would need a position argument; many key values could vary by position. Definitely another, distinct, headache-y issue. :)

What do you think about allowing a list of lists of length three (meaning that there are multiple comment style options that a language can supply)?

So you could present a GUI selection dialog, and, store the comment preference keyed by module-language (or something like that)?

I think that's good.

Also, I'm happy with something more generically named, eg ide-comment-delimiters? Or maybe another day we could do a renaming pass on all these things which would be fine with me.

I'm OK to keep using drracket:xxx for now as an alias for ide (it doesn't bother me). At most we could just add a margin-note about the intended general use. (Defer the renaming until also renaming Dr Racket to Dr Rhackbus or Dr Lang or whatever it will become. :))


We seem to have converged, so I think I'll do a PR for at least the docs, in case reading that format suggests other details to tweak.

rfindler commented 9 months ago

I have a question: in the line comment mode, what role does the padding play? Specifically, if I have a line comment spec that is (list 'line s1 s2), how is that different than the line comment spec (list 'line (string-append s1 s2) "")?

My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right?

If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)


Also, I'm noticing that the default of (list 'line ";;" " ") will give different behavior than the current DrRacket, which effectively has the default of (list 'line ";" ""), IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?

AlexKnauth commented 9 months ago

For a predicate saying "is this line commented out?", should it be (list 'line ";" "; ") to produce ;; line comments?

rfindler commented 9 months ago

Should the contracts disallow newlines (and returns?) in the strings?

greghendershott commented 9 months ago

I have a question: in the line comment mode, what role does the padding play? Specifically, if I have a line comment spec that is (list 'line s1 s2), how is that different than the line comment spec (list 'line (string-append s1 s2) "")?

My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right?

If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)

I think that's right.

But, frankly, I'm following the lead of Emacs here, where comment-padding is a distinct thing. Although I can't think of a real-world language example where this would be anything other than one space " ", I'm assuming comment-padding was added because it turned out to be useful for one or more languages, over the decades.

So I'm not sure if it's standing on shoulders or cargo culting. But it seemed harmless (?) to present it distinctly rather than lossy-glomming it into the same values as comment-start and comment-end, and having tools try to deduce it.

Also one use case maybe N/A for DrRacket is Emacs commands that re-fill comments (re-flow for a width) after or during editing.

Also, I'm noticing that the default of (list 'line ";;" " ") will give different behavior than the current DrRacket, which effectively has the default of (list 'line ";" ""), IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?

Oh. Sorry! I hadn't realized that.

Yeah, in many lisps (including Scheme, from what I can tell from scheme-mode in Emacs), there are conventions that seem to be:

Which I think shows that comments can get quickly into style/preference territory.

I guess my assumption is that tools can offer ways to configure this kind of stuff, maybe, for users who really care. Trying to capture all that variety was a non-goal for this lang info proposal. Instead it's a way for lang to tell the tool (so the user doesn't have to) about basic commenting behavior that will work and be correct.


But on both points maybe I'm being too lazy or "MVP", and it should be more complicated?

greghendershott commented 9 months ago

Should the contracts disallow newlines (and returns?) in the strings?

Yes, that would be most correct. What would be the clearest way to write such a contract, do you think?

rfindler commented 9 months ago

Should the contracts disallow newlines (and returns?) in the strings?

Yes, that would be most correct. What would be the clearest way to write such a contract, do you think?

(and/c string? (not/c #rx"[\r\n]"))

for no newlines and no return characters, which would be my preference, given the way that text% works.

rfindler commented 9 months ago

I have a question: in the line comment mode, what role does the padding play? Specifically, if I have a line comment spec that is (list 'line s1 s2), how is that different than the line comment spec (list 'line (string-append s1 s2) "")? My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right? If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)

I think that's right.

But, frankly, I'm following the lead of Emacs here, where comment-padding is a distinct thing. Although I can't think of a real-world language example where this would be anything other than one space " ", I'm assuming comment-padding was added because it turned out to be useful for one or more languages, over the decades.

So I'm not sure if it's standing on shoulders or cargo culting. But it seemed harmless (?) to present it distinctly rather than lossy-glomming it into the same values as comment-start and comment-end, and having tools try to deduce it.

I think what we settled on makes sense and including padding is a good idea. My only thought is that we could clarify the documentation to explain what the padding is, so that people who have the confusion that I had earlier today would find clarification in the docs. Does that seem okay to you?

Also one use case maybe N/A for DrRacket is Emacs commands that re-fill comments (re-flow for a width) after or during editing.

This would be nice to have eventually so I'm glad that door isn't prematurely closed.

Also, I'm noticing that the default of (list 'line ";;" " ") will give different behavior than the current DrRacket, which effectively has the default of (list 'line ";" ""), IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?

Oh. Sorry! I hadn't realized that.

Yeah, in many lisps (including Scheme, from what I can tell from scheme-mode in Emacs), there are conventions that seem to be:

* `;` for comments at the end of a line, following code

* `;;` for "block" comments, nothing but the comment one the one or more lines

* `;;;` for "section" comments

* I've even seen `;;;;` for comments pertaining to the entire file

Which I think shows that comments can get quickly into style/preference territory.

I guess my assumption is that tools can offer ways to configure this kind of stuff, maybe, for users who really care. Trying to capture all that variety was a non-goal for this lang info proposal. Instead it's a way for lang to tell the tool (so the user doesn't have to) about basic commenting behavior that will work and be correct.

But on both points maybe I'm being too lazy or "MVP", and it should be more complicated?

I'm happy to change DrRacket's behavior. I just wanted to make sure we all are and maybe discuss alternatives if someone is worried. I like the conventions you list above and am happy that DrRacket will be moving towards them.

rfindler commented 9 months ago

I've noticed an inconsistency in the region comment docs. At this spot and this spot, the order is start, continue, end, but in this spot and this spot the order is start, end, continue.


Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say @item{@racket[_start] opens a comment}.

rfindler commented 9 months ago

I've made some progress on the Revise DrRacket's (un)comment commands task. Forgot to mention this issue in the commits, tho (oops!). They are here, here.

greghendershott commented 9 months ago

I've noticed an inconsistency in the region comment docs. At this spot and this spot, the order is start, continue, end, but in this spot and this spot the order is start, end, continue.

Ouch. The prose is correct. I'll fix those examples that don't match.

greghendershott commented 9 months ago

Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say @item{@racket[_start] opens a comment}.

Much like I forgot that DrRacket does ; not ;;, I forgot that it doesn't include the padding.

Original:

lorem
ipsum

DrRacket:

;lorem
;ipsum

Emacs lisp modes (lisp, elisp, scheme, racket):

;; lorem
;; ipsum

I've lived with these so long editing {elisp racket scheme} in Emacs I took it for granted. But it's taste not law. Anyway that assumption went into the docs.

I don't know if it's something where, like ; vs. ;;, you'd like to adopt that convention, or not?

rfindler commented 9 months ago

Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say @item{@racket[_start] opens a comment}.

Much like I forgot that DrRacket does ; not ;;, I forgot that it doesn't include the padding.

Original:

lorem
ipsum

DrRacket:

;lorem
;ipsum

Emacs lisp modes (lisp, elisp, scheme, racket):

;; lorem
;; ipsum

I've lived with these so long editing {elisp racket scheme} in Emacs I took it for granted. But it's taste not law. Anyway that assumption went into the docs.

I don't know if it's something where, like ; vs. ;;, you'd like to adopt that convention, or not?

I'm sorry-- this is a comment about the block comment specifications, not the line comment specifications. If we're going to insert the padding right at the same place as the opener is inserted (following the C++ example in the region section of the docs), we'll get things like this:

/* * lorem
 * ipsem
 */

which doesn't look right.

Or am I misunderstanding something else?

greghendershott commented 9 months ago

I'm sorry-- this is a comment about the block comment specifications, not the line comment specifications. If we're going to insert the padding right at the same place as the opener is inserted (following the C++ example in the region section of the docs), we'll get things like this:

/* * lorem
 * ipsem
 */

I think you might be confusing continue and padding? The doc says:

 @item{@racket[(list 'region start continue end padding)], where:

   @itemlist[

    @item{@racket[_start] then @racket[_padding] opens a comment}

    @item{@racket[_continue] then @racket[padding] is added to the
beginning of each line except the first one when a comment spans
multiple lines}

    @item{@racket[_padding] then @racket[_end] closes a comment}]

   Racket example: @racket['(region "#|" "  " "|#" " ")].

   C++ example: @racket['(region "/*" " *" "*/" " ")].}

So using the C++ example:

The first line uses start + padding = "/*" + " " = "/* ". The last line uses padding + end = " " + "*/" = " */". Any other lines in between use continue + padding = " *" + " " = " * ".

greghendershott commented 9 months ago

If you're reading only via email and GitHub doesn't email updates: I edited the end of my previous post.

rfindler commented 9 months ago

I think you might be confusing continue and padding?

Oh yes! Thank you, that was exactly my problem.