rems-project / sail

Sail architecture definition language
Other
563 stars 92 forks source link

handle doc_comment as block_comment #581

Open trdthg opened 2 weeks ago

trdthg commented 2 weeks ago

Hi, I'm going to test sail-fmt on riscv-sails like this PR did https://github.com/riscv/sail-riscv/pull/264, and then try to fix them one by one


This PR is for fix an error when formatting something like

/*! hahaha*/
Syntax error.
. /test/format/unbalanced_comment.sail:2.0-0: | current token: .
  | current token: . 

seems to be related to https://github.com/rems-project/sail/issues/237


I have a few questions, could you explain

github-actions[bot] commented 2 weeks ago

Test Results

    9 files  ±0     20 suites  ±0   0s :stopwatch: ±0s   645 tests ±0    645 :white_check_mark: ±0  0 :zzz: ±0  0 :x: ±0  2 057 runs  ±0  2 056 :white_check_mark: ±0  1 :zzz: ±0  0 :x: ±0 

Results for commit bc855b19. ± Comparison against base commit 1edb36b4.

Alasdair commented 2 weeks ago

Documentation comments are attached to the definitions they are associated with in the AST. Each top-level definition is annotated with a value of type def_annot, which optionally contains a documentation comment. This type is defined here: https://github.com/rems-project/sail/blob/1edb36b4ebc8d7b04ab9b1957cc202336f9bd2b6/language/sail.ott#L126

When pattern matching on a toplevel definition you do something like

match def with
| DEF_aux (def, def_annot) -> (* def here is the actual definition within the aux wrapper *)
  let comment = def_annot.doc_comment in

You therefore don't need to add documentation comments to the comments variable — they will always be where you want them attached to the AST element they document.

'Doc' in the lexer is the token type for a documentation comment, there are then rules in the parser that create documented definitions like:

  | doc = Doc; def = def
    { DEF_aux (DEF_doc (doc, def), loc $startpos(doc) $endpos(doc)) }

the initial_check desugaring pass removes the separate DEF_doc constructors and attaches them into the annotation type I mentioned above here: https://github.com/rems-project/sail/blob/1edb36b4ebc8d7b04ab9b1957cc202336f9bd2b6/src/lib/initial_check.ml#L1305

Finally, the reason why the rule looks like:

let startpos = Lexing.lexeme_start_p lexbuf in
let arg = doc_comment startpos (Buffer.create 10) 0 false lexbuf in
lexbuf.lex_start_p <- startpos;
Doc arg

is because in Menhir the lexbuf positions are used to get the span information in each parsing rule. Ocamllex will automatically advance the lexbuf positions as it scans the documentation comment using the doc_comment rule, but we want menhir to see Doc as a single token with the correct start and end positions so we save the starting position and restore it after running the doc_comment rule.

Alasdair commented 2 weeks ago

Actually in the context of the formatter you can ignore what I said about def_annot and DEF_aux, as it is run on the AST before initial_check.ml does the attaching, so you have the DEF_doc constructors the parser produces there.

Alasdair commented 1 week ago

I fixed the bug that caused documentation comments to fail here: https://github.com/rems-project/sail/pull/592

Should also mean they appear in the formatted output, although there may be indentation issues if whatever they are commenting is indented.

trdthg commented 1 week ago

Looks good. I tried to modify ast before and added a DEF_doc_only to distinguish DEF_doc, and it also worked. But your method does not modify ast, only affects the fmt part, I think your method is better. As for indentation, I will test it later.