rems-project / sail

Sail architecture definition language
Other
590 stars 103 forks source link

Fix lexer rule to handle operator of format like `operator + comment` #628

Open trdthg opened 2 months ago

trdthg commented 2 months ago

try fix #627

Maybe fixed

the basic rule here is

 A (B A)* (for  odd-number chars)
(B A)*    (for even-number chars)

then use it on both maybe_slash_start operator and no_slash_start operator trying to cover all

but I don't have a good way of checking the rules

some format output like if eq2_comment == /**/ /**/ 1 then { is wrong, will be fixed in my next/other PR


Update

//////              (should be line_comment)

let a = 1 ///       (rust-fmt treat this as error)
    1;

let a = 1 / //      (!current)(slash and comment)
    1;

can't be handled correctly for now


Update

you can push to my branch directly if needed

github-actions[bot] commented 2 months ago

Test Results

    9 files  ± 0     20 suites  ±0   0s :stopwatch: ±0s   663 tests + 6    663 :white_check_mark: + 6  0 :zzz: ±0  0 :x: ±0  2 112 runs  +16  2 111 :white_check_mark: +16  1 :zzz: ±0  0 :x: ±0 

Results for commit be519cfa. ± Comparison against base commit 13d94588.

:recycle: This comment has been updated with latest results.

Alasdair commented 2 months ago

Test case looks good, I'll take a closer look at this next week and hopefully get it merged.

trdthg commented 2 months ago

but they looks very repetitive and boring

and even operator + /*! is not added yet

Alasdair commented 2 months ago

There should be a way to do this with just a slightly more sophisticated regex, and without adding extra lexing rules I think. Using s to represent a safe operator character it would be something like:

([s]*(([/][s])?|([*]+[s])?))*([s]*([/]|[*]+))|[s]

i.e. the logic is something like:

Alasdair commented 2 months ago

Probably the only way to gain confidence in such a regex would be to test it on all possible operator character sequences up to some bound.

trdthg commented 2 months ago

I've tested your regex with

let s = oper_char_no_slash_star
let non_comment_start = ['/'] oper_char_no_slash_star
let non_block_comment_end = ['*']+ oper_char_no_slash
let safe_start = non_comment_start | non_block_comment_end

let non_comment_end = (s* (['/'] | ['*']+)) | s

let operator = (s* safe_start?)* non_comment_end ('_' ident)?

should be same, the names are only for my to understand

also test on this, with no change:

let s = oper_char_no_slash_star
let operator = 
  (s* 
    (
      (['/'] s)?
      |
      (['*']+ s)?
    )
  )* 
  (s*
    (
      ['/']
      |
      ['*']+
    )
  )
  |
  s

but some test case failed:

The rules op "//" and op "/*" I added is only used for interpreting =// as an op = and a line_comment, not an error.

So if you prefer keep it as an error (rust do so), I can just delete the complex rules and only keep the brife let operator = xxx part

and delete some test case

Alasdair commented 2 months ago

Maybe we can just simplify the rules for operators slightly. We could:

  1. Just say that '/' is forbidden in multi-character operator sequences entirely (as '/' by itself is fine for division)
  2. Only allow a single '/' in operators that do not contain '*' (would permit some operators like </>, which I've seen used in Haskell code)
  3. Forbid operators of length > 3, as anything longer just seems like poor style. Then the space of 'bad' operators involving comment sequences is much easier to define.
Alasdair commented 2 months ago

I'm leaning towards option 2, as it's the most conservative and is still fairly easy to explain in the manual.

trdthg commented 2 months ago

I can think of the following cases right now:

So I think option 2 is ok, I guess it can cover (maybe) 99% cases