red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

WISH : Inline and multi-line comments at the lexer level #107

Open zwortex opened 3 years ago

zwortex commented 3 years ago

Current comments are end-of-line comments, introduced by the semi-colon (;). They are filtered at the lexer level, so that anything in a line that follows a semi-colon, up to the end of the line, is ignored by the lexer. Obviously, semi-colons within a string are not interpreted likewise.

The proposition is to extend this mechanism so as to cover inline comments and multi-lines comments as well.

To that end, the lexer should handle semi-colons, outside of a string, in the following way :

Here follows a script with an illustration of a possible use of these extended comments :

; Single line comments unchanged, be it at the beginning of a line
print "a" ; or at its end
print ;; "b1 commented" ;; "b2"
;; double semi-colon behaves like end-of-line comments when left unclosed
;;; a multiline comments
print "c" ; first line
print "d" ; second line
;;; ; ends there - note that an additional ; is needed here to pacify whatever follows ;;;
; inline or multi-lines comments work as well within a data structure
a: #{ 
    first: [ val1 val2 ;; val3 ;; val4 val5 ] ; val3 removed
    ; commenting second, third, fourth but not fifth
    ;;; second: [ val1 val2 ] 
    third: [ val1 val2 ]
    fourth: [ val1 val2 ]
    ;;; fifth: [ val1 val2 ]
}
; or with any dialect
view [
    text "Hello" ;; bold ;; gray
    ;;;
    button "Btn1"
    button "Btn2"
    ;;;
    button "Btn3"
]
;;;; multi-line comments of higher order, here just remove whatever follows
print "Not this"
;;;
print "Not that"
;;;
print "Nor this"

Comment markers do not have to be nested, nor balanced :

What benefit is expected : a means of inline or multi-line commenting that :

What this proposition is not :

Such a change, though highly conservative, might still impact existing code. This may happen in the following situations :

  1. ;;; is already used in an existing comment - for instance for presentation purposes
  2. ;; is used twice in the same existing comment, and more particularly when commenting another comment. Here is the pattern : ;; comment1 ;; comment2. In such situation the second comment might become relevant.
  3. finally a problem may arise when using the block comment feature of an IDE that is not well behaved : a bad commenting feature prepends the line to be commented with a single semi-colon. If the line happens to be commented already, that may have adverse effects. However, a good behaved feature add instead a semi-colon followed by a space, which behaves the same. This is how the feature is implemented in Visual Studio Code for instance. This behaviour should be generalised.

The table below shows, for various repos, the number of lines and files that would be impacted by the proposal : whether condition (1) or (2) is met. The same computation was made against the legacy rebol scripts (here), using (.r) file filter instead of (.red -o .reds). That gives a feel of what might be Rebol coding habits around. This is the worst case, for which at most 0.12% of the code lines are impacted by the proposed and those lines can be easily corrected adding "; " in front of those lines.

Repo Nb Lines (a) Case (1) - Lines with ;;; (b) Case (2) - Lines with ;; (c) Nb files (d) Nb files affected (e) %Lines affected (f) %Files affected (g)
red 231,295 2 0 404 1 0.001 % 0,25%
code 97,214 0 0 205 0 0% 0%
community 16,312 4 2 54 2 0.04% 3,7%
VScode-extension 4,867 0 0 9 0 0% 0%
Rebol script library 339,759 347 75 1242 49 0.12% 3,9%

Following are the unix commands that compute the values of the table. Basically, each command collects all red and reds files in the hierarchy, then apply to it a grep that retrieves the lines that might be troublesome. Further down the pipe, it counts how many such lines were detected using wc.

# (a) Total number of lines in all files
find . \( -name "*.red" -o -name "*.reds" \) -exec wc -l {} \; | cut -d " " -f 1 | paste -sd+ - | bc
# (b) Lines with triple (or more) semi-colon
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;;" {} \; | wc -l
# (c) Lines with double comment pattern ;; <comment1> ;; <comment2>
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;[^;]+;;" {} \; | grep -Ev ";;[^;]+;;[[:space:]]*$" | wc -l
# (d) Total number of files
find . \( -name "*.red" -o -name "*.reds" \) | wc -l
# (e) Number of files impacted by (b) or (c)
find . \( -name "*.red" -o -name "*.reds" \) -exec grep -EH ";;;|;;[^;]+;;" {} \; | grep -Ev ";;[^;]+;;[[:space:]]*$" | cut -d ":" -f 1 -s | uniq | wc -l
# (f) = ((b) + (c)) *100 / (a)
# (g) = (e) * 100 / (d) 

Related

greggirwin commented 3 years ago

Thanks for the detailed entry @zwortex.