Open greghendershott opened 1 year ago
In C, I expect the configuration to be ("/*" "*/" "")
. Now, If I make a selection that spans multiple lines and hit comment, it probably should wrap /*
and */
around the selection.
But in Python, if I make a selection that spans multiple lines and hit comment, it probably should add #
on every line.
How do we distinguish these two scenarios?
I like where this is heading and am happy to help out with DrRacket's comment/uncomment functionality (or whatever else seems useful!).
Some thoughts:
@sorawee
But in Rhombus, if I make a selection that spans multiple lines and hit comment, it probably should add
#
on every line.
I don't understand what you mean here? This claims that Rhombus has C-style comments?
@rfindler I think the Emacs attitude is that multi-line comments are handled as just a series of lines with single-line comment delimiters... but I'm not entirely sure.
However wrt the tool-agnostic proposal: I think it's fine to have a drracket:comments
value include both "line-oriented" and (if different) "region-oriented" comment delimiters. And then it's up to the tool, plus whatever user choice config that tool enables. So e.g. some tools might do what (I imagine) Emacs does, and treat a region as just as series of individually commented lines -- whereas other tools might try to do something fancier for region comments... and this might be user configurable to support any personal/project/company coding preference/standard.
TL;DR maybe the value should be ~=
(list comment-start comment-end comment-spacing)
;; or
(list (list line-comment-start line-comment-end line-comment-spacing)
(list region-comment-start region-comment-end region-comment-spacing))
Does that sound better?
Sorry, I edited to Python precisely because I thought you might have this response :)
You can substitute it with any language that doesn't have block comment.
Oh no worries, I just genuinely didn't understand your point. :)
@greghendershott What is the UI for Emacs when entering a comment? Does it get a region as argument and then break it up into lines and put these delimiters around each line?
@rfindler The Emacs UX varies depending on customization variables like comment-multi-line
as well as whether you manually vs. auto-fill. (Mostly, I just use a comment-dwim
command on the selected text to add/remove comments. Sometimes refill manually. But there are more options I don't use.)
I think the proposal's focus is: What info should a lang supply about comments?
I'm not a lang designer or historian, but, it seems like there are three possible kinds of comments:
#;
).Does that seem about right? Is that the minimal info a lang would need to convey?
p.s. When a lang supports both line and region comments, how to handle multi-line comments is user preference/style. (IIUC, orthogonal to this proposal.)
The last of those C++ examples could not be automatically generated from a preference and the spec you've described, and thus would have to be custom for a particular language, but that's probably ok.
@greghendershott I agree generally with what you write but with one concern, namely that we should have some idea of what we're going to support so we can make sure that we can actually implement the UI that we want on the DrRacket side. It sounds like the Emacs side already has some support for taking some specific pieces of information and turning them into various UIs and I'm more than happy to go along with that (i.e., to make DrRacket follow Emacs's lead) but it seems important to make sure that we actually know what we're going to be getting so we don't wish we'd done something slightly differently somewhere along the way. So (just to make sure we're on the same page), I think I'm more in the "make sure I know what we're getting into" mode than "I want something different" mode, if that makes sense.
Right now, DrRacket has three commenty operations in the "Racket" menu:
;
at the start of each line of the selection. I can see how the (list comment-start comment-end comment-padding)
at the start of the issue is a generalization of that.Is that the plan? I guess the big question with that plan is whether or not we want to support comment/uncommenting via region comments where we don't do things line by line but instead put an open comment in one place and close comment in another place, possibly far away from each other.
I'm not sure we need special IDE support for #;
-style comments but maybe I'm missing something there? Opinions welcome. :)
On the one hand: I'm inclined to say it's sufficient for the info key to supply all information only knowable by the lang. The end.
On the other hand: I agree it's worth thinking through some examples of how tools would actually use the information. If I sounded dismissive, sorry, it's only because...
On the third hand: A lot of the Emacs commenting code seems to be accreted over decades; I would love to avoid learning too much about it, if setting a few comment-xxx
vars seems to suffice. :smile:
Setting aside the DrRacket comment box:
It sounds to me like DrRacket's "Comment with semicolons" could become "Add line comments", generalized to use the line-comment-start string supplied by the lang. AFAICT nothing deep would need to change?
Uncomment seems similar to me.
[I suppose removing comments could operate using color-lexer tokens, not textually? i.e. It's only for the sake of adding comments that we even need to get into the business of langs telling us about comment delimiter strings. (Having said that, for racket-hash-lang-mode
my initial plan is to let the Emacs remove-comment command work status quo, textual; not broken, don't fix.)]
And... I think that's it?
I mean, if a lang has some notion of region comments, for the sake of completeness I think the info key should allow communicating that. But that doesn't mean all tools must use all that information to do all possible things.
So I don't think the info key per se would obligate DrRacket to add some new region commenting features. It would just make it possible to do so, someday/maybe... or never. If users haven't been clamoring for this, then it's probably not worth doing.
(Subjective anecdata opinion warning: Over the years, the line-comment style has grown more popular than region-comments. The latter seem used just for a piece of a line, in langs that lack expression comments?)
Having said all that, I wish we had a representative of e.g. vscode to speak up here, too.
I think most (modern) IDEs/editors do not have comment and uncomment operations separately. Instead, there's only one operation, and the IDEs/editors automatically decide which mode it should be, based on whether the region is already commented.
I see that Sublime Text does something like that, always looking at entire lines. So if you have a line that's partially commented out and hit the button, it becomes fully commented out. And if you select some region that includes multiple lines but only parts of the first and last line, then it comments/uncomments the entire lines. (Hope that this make sense -- it seems like the most sensible operations if you start from the premise that you're working with entire lines.) This was with the "toggle comment" menu item when in a python file. I see there is also a toggle-block-comment menu item, however.
The last of those C++ examples could not be automatically generated from a preference and the spec you've described, and thus would have to be custom for a particular language, but that's probably ok.
It looks like Emacs defines a fourth value, "comment-continue
", for this.
By the way, another example I forgot:
/* Line 1 */
/* Line 2 */
Indeed this seems to be the default for Emacs c-mode
(not c++-mode
, which uses //
).
This morning I started digging into Emacs' newcomment.el
, about 1500 lines of Emacs Lisp. Not as crufty as I imagined. But non-trivial; reflecting the union of many langs and user preferences.
I'm still digesting. Among other things, trying to sort through which aspects are clearly lang-specified vs. clearly per-user preference/customization. And maybe some aspects that straddle both?
One quick observation: There are a few comment-xxx-function
variables, too -- e.g. the comment-region
command will call comment-region-function
if supplied, else do its "normal" thing using comment-start
and other string vars.
As an ultimate "escape" for scenarios and langs where a simple data-driven specification is insufficient, you can call a function to do the thing.
This suggests we might want a lang info key to be able to supply callable functions, as well.
I feel pretty comfortable with the line-based approach and am happy to go ahead if everyone else is.
I'm a bit nervous about adding more callable functions into the lang-based configuration. From the Emacs perspective, one has to keep a connection alive so the state of the function is always there which may be a pain for GC reasons (or maybe it's fine?); from the DrRacket perspective, it is one more place where malicious code can cause trouble for the people (although there are a lot of these already, alas). Maybe I shouldn't worry about this that much....
I'd be fine with naming the info key something like drracket:comment-delimiters
, and keeping the value data-driven.
After all, if someday it turns out some lang actually needs a function-driven approach:
We could add a new info key for that, later.
There's always a fallback via tool configuration. When a lang supports the module-language
info key, and a tool reflects that up to users for use in configuration (as I'm doing): A user can config their tool so that when the mod lang is foo-lang
, they set special Emacs Lisp comment-xxx-function
values... or use some DrRacket script config... or whatever extension mechanism the tool provides.
So I'm not too worried about the data-driven approach being simplistic. Even if that turns out to be the case (which I'm not even sure is likely), users can address it until/unless a new info key addresses it.
As for the data-driven values for the drracket:comment-delimiters
info key, I'd be fine with:
A list of three values, (list comment-start-string comment-end-string comment-padding-string)
.
If a tool offers fancier choices about comments (like various comment "styles", or choices about line- vs. region-oriented comments, or whatever) the user will need to configure that for their tool if they don't like the defaults for a lang. Again, the module-language
info key's symbol value can be the conditional for configuration.
?
As an initial experiment I added this for scribble
langs (IIUC they use at-exp-lib
):
modified pkgs/at-exp-lib/at-exp/lang/reader.rkt
@@ -13,30 +13,33 @@
;; Settings that apply just to the surface syntax:
(define (scribble-base-reader-info)
(lambda (key defval default)
(define (try-dynamic-require lib export)
(with-handlers ([exn:missing-module?
(λ (x) (default key defval))])
(dynamic-require lib export)))
(case key
[(color-lexer)
(try-dynamic-require 'syntax-color/scribble-lexer 'scribble-inside-lexer)]
[(drracket:indentation)
(try-dynamic-require 'scribble/private/indentation 'determine-spaces)]
[(drracket:keystrokes)
(try-dynamic-require 'scribble/private/indentation 'keystrokes)]
[(drracket:default-extension) "scrbl"]
+ [(drracket:comment-delimiters)
+ '("@;" "" " ")]
[else (default key defval)])))
"It works".
But note this:
modified pkgs/at-exp-lib/scribble/base/reader.rkt
@@ -9,41 +9,43 @@
(define-values (at-read at-read-syntax at-get-info)
(make-meta-reader
'at-exp
"language path"
lang-reader-module-paths
wrap-reader
(lambda (orig-read-syntax)
(define read-syntax (wrap-reader orig-read-syntax))
(lambda args
(define stx (apply read-syntax args))
(define old-prop (syntax-property stx 'module-language))
(define new-prop `#(at-exp/lang/language-info get-language-info ,old-prop))
(syntax-property stx 'module-language new-prop)))
(lambda (proc)
(lambda (key defval)
(define (fallback) (if proc (proc key defval) defval))
(define (try-dynamic-require lib export)
(with-handlers ([exn:missing-module?
(λ (x) (fallback))])
(dynamic-require lib export)))
(case key
[(color-lexer)
(try-dynamic-require 'syntax-color/scribble-lexer 'scribble-lexer)]
[(drracket:indentation)
(try-dynamic-require 'scribble/private/indentation 'determine-spaces)]
[(drracket:keystrokes)
(try-dynamic-require 'scribble/private/indentation 'keystrokes)]
+ ;; Note: Do /not/ supply drracket:comment-delimiters here;
+ ;; that would cause the at-exp meta-lang to overrule the
+ ;; main lang.
[else (fallback)])))))
In other words, we want something like #lang at-exp racket/base
not to cause @;
to replace ;;
. I can add a note about this when I write the docs (~= "Don't supply this info key for meta languages").
(I suppose there exists some fancy spec to allow mixed comment styles for mixed lang sources, but it's kind of giving me a headache and I'm not sure how/if all tools could suport this.)
I agree that a fancy spec for mixed comment styles seems like too much for now.
What do you think about allowing a list of lists of length three (meaning that there are multiple comment style options that a language can supply)?
Also, I'm happy with something more generically named, eg ide-comment-delimiters
? Or maybe another day we could do a renaming pass on all these things which would be fine with me.
I agree that a fancy spec for mixed comment styles seems like too much for now.
Yeah I think comments would be just one part of an entire conversation/spec about handling mixed/meta langs. It's almost as if the lang "get-info" function would need a position argument; many key values could vary by position. Definitely another, distinct, headache-y issue. :)
What do you think about allowing a list of lists of length three (meaning that there are multiple comment style options that a language can supply)?
So you could present a GUI selection dialog, and, store the comment preference keyed by module-language
(or something like that)?
I think that's good.
Also, I'm happy with something more generically named, eg
ide-comment-delimiters
? Or maybe another day we could do a renaming pass on all these things which would be fine with me.
I'm OK to keep using drracket:xxx
for now as an alias for ide
(it doesn't bother me). At most we could just add a margin-note about the intended general use. (Defer the renaming until also renaming Dr Racket to Dr Rhackbus or Dr Lang or whatever it will become. :))
We seem to have converged, so I think I'll do a PR for at least the docs, in case reading that format suggests other details to tweak.
I have a question: in the line
comment mode, what role does the padding play? Specifically, if I have a line comment spec that is (list 'line s1 s2)
, how is that different than the line comment spec (list 'line (string-append s1 s2) "")
?
My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right?
If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)
Also, I'm noticing that the default of (list 'line ";;" " ")
will give different behavior than the current DrRacket, which effectively has the default of (list 'line ";" "")
, IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?
For a predicate saying "is this line commented out?", should it be (list 'line ";" "; ")
to produce ;;
line comments?
Should the contracts disallow newlines (and returns?) in the strings?
I have a question: in the
line
comment mode, what role does the padding play? Specifically, if I have a line comment spec that is(list 'line s1 s2)
, how is that different than the line comment spec(list 'line (string-append s1 s2) "")
?My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right?
If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)
I think that's right.
But, frankly, I'm following the lead of Emacs here, where comment-padding
is a distinct thing. Although I can't think of a real-world language example where this would be anything other than one space " "
, I'm assuming comment-padding
was added because it turned out to be useful for one or more languages, over the decades.
So I'm not sure if it's standing on shoulders or cargo culting. But it seemed harmless (?) to present it distinctly rather than lossy-glomming it into the same values as comment-start
and comment-end
, and having tools try to deduce it.
Also one use case maybe N/A for DrRacket is Emacs commands that re-fill comments (re-flow for a width) after or during editing.
Also, I'm noticing that the default of
(list 'line ";;" " ")
will give different behavior than the current DrRacket, which effectively has the default of(list 'line ";" "")
, IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?
Oh. Sorry! I hadn't realized that.
Yeah, in many lisps (including Scheme, from what I can tell from scheme-mode
in Emacs), there are conventions that seem to be:
;
for comments at the end of a line, following code;;
for "block" comments, nothing but the comment on the one or more lines;;;
for "section" comments;;;;
for comments pertaining to the entire fileWhich I think shows that comments can get quickly into style/preference territory.
I guess my assumption is that tools can offer ways to configure this kind of stuff, maybe, for users who really care. Trying to capture all that variety was a non-goal for this lang info proposal. Instead it's a way for lang to tell the tool (so the user doesn't have to) about basic commenting behavior that will work and be correct.
But on both points maybe I'm being too lazy or "MVP", and it should be more complicated?
Should the contracts disallow newlines (and returns?) in the strings?
Yes, that would be most correct. What would be the clearest way to write such a contract, do you think?
Should the contracts disallow newlines (and returns?) in the strings?
Yes, that would be most correct. What would be the clearest way to write such a contract, do you think?
(and/c string? (not/c #rx"[\r\n]"))
for no newlines and no return characters, which would be my preference, given the way that text%
works.
I have a question: in the
line
comment mode, what role does the padding play? Specifically, if I have a line comment spec that is(list 'line s1 s2)
, how is that different than the line comment spec(list 'line (string-append s1 s2) "")
? My thinking is that, when uncommenting, we'd need a predicate saying "is this line commented out?" and that predicate would depend only on the first string (but that it would remove the second string if that string is present too). Is that right? If that's right, is this worth documenting? Is there any other interpretation of these strings that would make sense that a tool could use? (If not, then I'd argue we should document it so that language authors will have a better sense of which strings to list for their languages.)I think that's right.
But, frankly, I'm following the lead of Emacs here, where
comment-padding
is a distinct thing. Although I can't think of a real-world language example where this would be anything other than one space" "
, I'm assumingcomment-padding
was added because it turned out to be useful for one or more languages, over the decades.So I'm not sure if it's standing on shoulders or cargo culting. But it seemed harmless (?) to present it distinctly rather than lossy-glomming it into the same values as
comment-start
andcomment-end
, and having tools try to deduce it.
I think what we settled on makes sense and including padding is a good idea. My only thought is that we could clarify the documentation to explain what the padding is, so that people who have the confusion that I had earlier today would find clarification in the docs. Does that seem okay to you?
Also one use case maybe N/A for DrRacket is Emacs commands that re-fill comments (re-flow for a width) after or during editing.
This would be nice to have eventually so I'm glad that door isn't prematurely closed.
Also, I'm noticing that the default of
(list 'line ";;" " ")
will give different behavior than the current DrRacket, which effectively has the default of(list 'line ";" "")
, IIUC. I'm comfortable with this change in DrRacket's behavior on the assumption that it makes it match Emacs's behavior but if it doesn't or if others are concerned, we might want to do something differently somewhere, maybe?Oh. Sorry! I hadn't realized that.
Yeah, in many lisps (including Scheme, from what I can tell from
scheme-mode
in Emacs), there are conventions that seem to be:* `;` for comments at the end of a line, following code * `;;` for "block" comments, nothing but the comment one the one or more lines * `;;;` for "section" comments * I've even seen `;;;;` for comments pertaining to the entire file
Which I think shows that comments can get quickly into style/preference territory.
I guess my assumption is that tools can offer ways to configure this kind of stuff, maybe, for users who really care. Trying to capture all that variety was a non-goal for this lang info proposal. Instead it's a way for lang to tell the tool (so the user doesn't have to) about basic commenting behavior that will work and be correct.
But on both points maybe I'm being too lazy or "MVP", and it should be more complicated?
I'm happy to change DrRacket's behavior. I just wanted to make sure we all are and maybe discuss alternatives if someone is worried. I like the conventions you list above and am happy that DrRacket will be moving towards them.
I've noticed an inconsistency in the region comment docs. At this spot and this spot, the order is start, continue, end, but in this spot and this spot the order is start, end, continue.
Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say @item{@racket[_start] opens a comment}
.
Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say
@item{@racket[_start] opens a comment}
.
Much like I forgot that DrRacket does ;
not ;;
, I forgot that it doesn't include the padding.
Original:
lorem
ipsum
DrRacket:
;lorem
;ipsum
Emacs lisp modes (lisp, elisp, scheme, racket):
;; lorem
;; ipsum
I've lived with these so long editing {elisp racket scheme} in Emacs I took it for granted. But it's taste not law. Anyway that assumption went into the docs.
I don't know if it's something where, like ;
vs. ;;
, you'd like to adopt that convention, or not?
Also, I am wondering if the padding is really meant to be added to the first line. Judging from the examples, I think this line should just say
@item{@racket[_start] opens a comment}
.Much like I forgot that DrRacket does
;
not;;
, I forgot that it doesn't include the padding.Original:
lorem ipsum
DrRacket:
;lorem ;ipsum
Emacs lisp modes (lisp, elisp, scheme, racket):
;; lorem ;; ipsum
I've lived with these so long editing {elisp racket scheme} in Emacs I took it for granted. But it's taste not law. Anyway that assumption went into the docs.
I don't know if it's something where, like
;
vs.;;
, you'd like to adopt that convention, or not?
I'm sorry-- this is a comment about the block comment specifications, not the line comment specifications. If we're going to insert the padding right at the same place as the opener is inserted (following the C++ example in the region section of the docs), we'll get things like this:
/* * lorem
* ipsem
*/
which doesn't look right.
Or am I misunderstanding something else?
I'm sorry-- this is a comment about the block comment specifications, not the line comment specifications. If we're going to insert the padding right at the same place as the opener is inserted (following the C++ example in the region section of the docs), we'll get things like this:
/* * lorem * ipsem */
I think you might be confusing continue
and padding
? The doc says:
@item{@racket[(list 'region start continue end padding)], where:
@itemlist[
@item{@racket[_start] then @racket[_padding] opens a comment}
@item{@racket[_continue] then @racket[padding] is added to the
beginning of each line except the first one when a comment spans
multiple lines}
@item{@racket[_padding] then @racket[_end] closes a comment}]
Racket example: @racket['(region "#|" " " "|#" " ")].
C++ example: @racket['(region "/*" " *" "*/" " ")].}
So using the C++ example:
The first line uses start
+ padding
= "/*"
+ " "
= "/* "
.
The last line uses padding
+ end
= " "
+ "*/"
= " */"
.
Any other lines in between use continue
+ padding
= " *"
+ " "
= " * "
.
If you're reading only via email and GitHub doesn't email updates: I edited the end of my previous post.
I think you might be confusing
continue
andpadding
?
Oh yes! Thank you, that was exactly my problem.
Things like DrRacket and Emacs offer commands to comment and un-comment a region.
Non-s-expression langs should be able to supply information for these commands, via a new info key named something like "
drracket:comments
".Open to improvements, but initially I'll suggest the value is a list:
where those three values as identical to three configuration variables in Emacs:
Examples:
(";;" "" " ")
("@;" "" " ")
("//" "" " ")
I'd suggest that when the key isn't supplied, it's OK to default to s-expression as the default. (Justification: There exist many more s-expression langs.)
Whatever the final form, I think the to-do list would be:
If agreed/confirmed, I would probably be able to make PRs easily for all but the last. (I'm not familiar with that code. I'm not sure about details like whether a "Racket" menu is where these should continue to live.)
Background: https://github.com/greghendershott/racket-mode/pull/661#issuecomment-1693769408