WISH : Inline and multi-line comments at the lexer level

Current comments are end-of-line comments, introduced by the semi-colon (;). They are filtered at the lexer level, so that anything in a line that follows a semi-colon, up to the end of the line, is ignored by the lexer. Obviously, semi-colons within a string are not interpreted likewise.

The proposition is to extend this mechanism so as to cover inline comments and multi-lines comments as well.

To that end, the lexer should handle semi-colons, outside of a string, in the following way :

single semi-colon (;) should be treated like they are at present, commenting out the rest of the line,
double semi-colon (;;) should comment out up to the next double semi-colon within the same line, or up to the rest of the line if none is found ( this is to ensure most of the existing code, that may use double semi-colons already are not affected by the proposal).
triple semi-colon (;;;) should comment out up to the next triple semi-colon found within the document, or up to the end of the document if none is found. Note that the next triple semi-colon marker to consider should not be within the same line as the opening one, but at least in the line that follows. This is to clearly distinguish multi-line comments from single-line comments.
other markers of higher level (4 semi-colons ;;;; , 5 semi-colons ;;;;;, etc.), in case there are needed, should behave similarly as the triple marker, opening a multi-line comment, down to the next marker of the same level - 4 semi-colons should only match with another 4 semi-colons - and could be used to supersede any comments of lower level falling in between or down to the end of the document if such marker cannot be found.

Here follows a script with an illustration of a possible use of these extended comments :

; Single line comments unchanged, be it at the beginning of a line
print "a" ; or at its end
print ;; "b1 commented" ;; "b2"
;; double semi-colon behaves like end-of-line comments when left unclosed
;;; a multiline comments
print "c" ; first line
print "d" ; second line
;;; ; ends there - note that an additional ; is needed here to pacify whatever follows ;;;
; inline or multi-lines comments work as well within a data structure
a: #{ 
    first: [ val1 val2 ;; val3 ;; val4 val5 ] ; val3 removed
    ; commenting second, third, fourth but not fifth
    ;;; second: [ val1 val2 ] 
    third: [ val1 val2 ]
    fourth: [ val1 val2 ]
    ;;; fifth: [ val1 val2 ]
}
; or with any dialect
view [
    text "Hello" ;; bold ;; gray
    ;;;
    button "Btn1"
    button "Btn2"
    ;;;
    button "Btn3"
]
;;;; multi-line comments of higher order, here just remove whatever follows
print "Not this"
;;;
print "Not that"
;;;
print "Nor this"

Comment markers do not have to be nested, nor balanced :

the regular semi-colon dismisses whatever characters follows in the line even if it is another comment marker : typically if a double semi-colon or more were to follow a single semi-colon in the same line, they would be ignored - current behaviour.
the double semi-colon if not balanced behaves like the regular semi-colon, and when balanced dismisses whatever is within it. Likewise, whatever comment marker may fall in between should be ignored.
the triple semi-colon if not balanced will dismiss everything down to the end of the document, and if balanced anything between these two markers.

What benefit is expected : a means of inline or multi-line commenting that :

works straightaway, in present, as in the future, with any data structure, any dialect,
is little intrusive as possible - you don't need to restructure the code to add a comment - just enclose between the appropriate markers, whatever code needs to be commented out,
is not consuming any new punctuation, or syntactic marker, that might be useful to others or in the future

What this proposition is not :

a new implementation for comments : basically it is intended to be as little regressive as possible (see below),
an annotation mechanism or a documentation mechanism for red code or for red data structure. Currently, comments, introduced by the semi-colon, are wiped out and don't reach the next level (syntactic). That is the expected behaviour here as well. If a more modern comment scheme is needed, this is not targeted by this proposal.

Such a change, though highly conservative, might still impact existing code. This may happen in the following situations :

;;; is already used in an existing comment - for instance for presentation purposes
;; is used twice in the same existing comment, and more particularly when commenting another comment. Here is the pattern : ;; comment1 ;; comment2. In such situation the second comment might become relevant.
finally a problem may arise when using the block comment feature of an IDE that is not well behaved : a bad commenting feature prepends the line to be commented with a single semi-colon. If the line happens to be commented already, that may have adverse effects. However, a good behaved feature add instead a semi-colon followed by a space, which behaves the same. This is how the feature is implemented in Visual Studio Code for instance. This behaviour should be generalised.

The table below shows, for various repos, the number of lines and files that would be impacted by the proposal : whether condition (1) or (2) is met. The same computation was made against the legacy rebol scripts (here), using (.r) file filter instead of (.red -o .reds). That gives a feel of what might be Rebol coding habits around. This is the worst case, for which at most 0.12% of the code lines are impacted by the proposed and those lines can be easily corrected adding "; " in front of those lines.

Repo	Nb Lines (a)	Case (1) - Lines with ;;; (b)	Case (2) - Lines with ;; (c)	Nb files (d)	Nb files affected (e)	%Lines affected (f)	%Files affected (g)
red	231,295	2	0	404	1	0.001 %	0,25%
code	97,214	0	0	205	0	0%	0%
community	16,312	4	2	54	2	0.04%	3,7%
VScode-extension	4,867	0	0	9	0	0%	0%
Rebol script library	339,759	347	75	1242	49	0.12%	3,9%

Following are the unix commands that compute the values of the table. Basically, each command collects all red and reds files in the hierarchy, then apply to it a grep that retrieves the lines that might be troublesome. Further down the pipe, it counts how many such lines were detected using wc.

# (a) Total number of lines in all files
find . \( -name "*.red" -o -name "*.reds" \) -exec wc -l {} \; | cut -d " " -f 1 | paste -sd+ - | bc
# (b) Lines with triple (or more) semi-colon
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;;" {} \; | wc -l
# (c) Lines with double comment pattern ;; <comment1> ;; <comment2>
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;[^;]+;;" {} \; | grep -Ev ";;[^;]+;;[[:space:]]*$" | wc -l
# (d) Total number of files
find . \( -name "*.red" -o -name "*.reds" \) | wc -l
# (e) Number of files impacted by (b) or (c)
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;;|;;[^;]+;;" {} \; | grep -Ev ";;[^;]+;;[[:space:]]*$" | cut -d ":" -f 1 -s | uniq | wc -l
# (f) = ((b) + (c)) *100 / (a)
# (g) = (e) * 100 / (d)

Related

the initial conversion in git, started here https://gitter.im/red/help?at=610695377331d202b5e048c4 was pursued and ended there https://gitter.im/red/red?at=610884169e84ba381e536f85
an old, and closed, related issue #724, with an extensive conversation on the matter, and where Pierre (@pchg) made a similar proposition as the one detailed here : https://github.com/red/red/issues/724#issuecomment-41608703
an entry in the wiki by @greggirwin, summing up various thoughts on the matter : https://github.com/red/red/wiki/%5BDOC%5D-Red-Should...-(Feature-Wars)#block-comments

red / REP

WISH : Inline and multi-line comments at the lexer level #107