rems-project / cerberus

Cerberus C semantics
https://www.cl.cam.ac.uk/~pes20/cerberus/
Other
42 stars 19 forks source link

Cerberus lexer crashes on "\e" #370

Closed dc-mak closed 2 weeks ago

dc-mak commented 2 weeks ago

I'm aware that \e is not an ISO C character, the issue is that Cerberus is not displaying "illegal character" or a proper error, instead crashing confusingly.

18:45 ➜  cerberus git:(master) ✗ cat tmp.c && cerberus tmp.c
void f() {
    "\e";
}
CPARSER_DRIVER (Failure)
cerberus: internal error, uncaught exception:
          Failure("lexing: empty token")
          Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
          Called from C_parser_driver.parse in file "parsers/c/c_parser_driver.ml", line 133, characters 24-58
          Called from Cerb_backend__Pipeline.c_frontend.parse in file "backend/common/pipeline.ml", line 172, characters 4-60
          Called from Cerb_backend__Pipeline.c_frontend_and_elaboration in file "backend/common/pipeline.ml", line 238, characters 2-67
          Called from Dune__exe__Main.frontend in file "backend/driver/main.ml", line 30, characters 4-60
          Called from Dune__exe__Main.cerberus.main.(fun) in file "backend/driver/main.ml", line 156, characters 6-47
          Called from Cerb_frontend__Exception.except_foldlM in file "ocaml_frontend/generated/exception.ml", line 69, characters 18-25
          Called from Dune__exe__Main.cerberus.(fun) in file "backend/driver/main.ml", line 268, characters 8-24
          Called from Dune__exe__Main.cerberus in file "backend/driver/main.ml", line 263, characters 8-1023
          Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
          Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 35, characters 37-44

This has surprisingly difficult to debug - I tried a few things with no luck

18:49 ➜  cerberus git:(master) ✗ git --no-pager diff
diff --git a/parsers/c/c_lexer.mll b/parsers/c/c_lexer.mll
index a96016473..ea66e1041 100644
--- a/parsers/c/c_lexer.mll
+++ b/parsers/c/c_lexer.mll
@@ -312,7 +312,7 @@ let octal_escape_sequence =

 let simple_escape_sequence =
     "\\'" | "\\\"" | "\\?" | "\\\\" | "\\a" | "\\b" | "\\f" | "\\n"
-  | "\\r" | "\\t" | "\\v"
+  | "\\r" | "\\t" | "\\v"  | "\\e"

 let escape_sequence =
     simple_escape_sequence
@@ -346,7 +346,8 @@ let cn_integer_width = ("8" | "16" | "32" | "64" | "128")

 rule s_char_sequence = parse
   | s_char as x
-      { let xs = s_char_sequence lexbuf in
+      { print_string ("debug_char: " ^ x ^ "\n");
+        let xs = s_char_sequence lexbuf in
         x :: xs }
   | '"'
       { [] }
@@ -467,6 +468,7 @@ and initial flags = parse
   | '"'
       { let saved_start_p = lexbuf.lex_start_p in
         let strs = s_char_sequence lexbuf in
+        print_string ("debug_str: "^String.concat "" strs ^ "\n");
         lexbuf.lex_start_p <- saved_start_p;
         STRING_LITERAL (None, strs) }
   | ("u8" | 'u' | 'U' | 'L') as pref '"'
18:49 ➜  cerberus git:(master) ✗ lldb-12 -- cerberus tmp.c
(lldb) target create "cerberus"
Current executable set to 'cerberus' (x86_64).
(lldb) settings set -- target.run-args  "tmp.c"
(lldb) breakpoint set -n caml_failwith
Breakpoint 1: where = cerberus`caml_failwith at fail_nat.c:131:1, address = 0x0000000001152670
(lldb) run
Process 18678 launched: '/home/dhruv/.opam/4.14.1/bin/cerberus' (x86_64)
Process 18678 stopped and restarted: thread 1 received signal: SIGCHLD
Process 18678 stopped
* thread #1, name = 'cerberus', stop reason = breakpoint 1.1
    frame #0: 0x00005555566a6670 cerberus`caml_failwith(msg="lexing: empty token") at fail_nat.c:131:1
(lldb) bt
* thread #1, name = 'cerberus', stop reason = breakpoint 1.1
  * frame #0: 0x00005555566a6670 cerberus`caml_failwith(msg="lexing: empty token") at fail_nat.c:131:1
    frame #1: 0x00005555566bbfa9 cerberus`caml_lex_engine(tbl=<unavailable>, start_state=<unavailable>, lexbuf=<unavailable>) at lexing.c:106:9
    frame #2: 0x000055555664424d cerberus`camlStdlib__Lexing__engine_300 at lexing.ml:65:15
    frame #3: 0x0000555555db1149 cerberus`camlC_lexer____ocaml_lex_s_char_sequence_rec_798 at c_lexer.ml:1800:8
    frame #4: 0x0000555555db22ab cerberus`camlC_lexer____ocaml_lex_initial_rec_808 at c_lexer.ml:1798:3
    frame #5: 0x0000555555db302e cerberus`camlC_lexer__lexer_1080 at c_lexer.mll:598:18
    frame #6: 0x0000555555e6fa06 cerberus`camlMenhirLib__fun_4777 at menhirLib.ml:2127:16
    frame #7: 0x0000555555de14b2 cerberus`camlC_parser___menhir_run_0671_8333 at c_parser.ml:23544:17
    frame #8: 0x0000555555e6bec4 cerberus`camlC_parser_driver__handle_494 at c_parser_driver.ml:21:30
    frame #9: 0x0000555555e6cd6d cerberus`camlC_parser_driver__parse_1289 at c_parser_driver.ml:133:24
    frame #10: 0x0000555555da5d27 cerberus`camlCerb_backend__Pipeline__fun_5208 at pipeline.ml:172:4
    frame #11: 0x0000555555da60e2 cerberus`camlCerb_backend__Pipeline__c_frontend_and_elaboration_2225 at pipeline.ml:238:2
    frame #12: 0x0000555555d9d828 cerberus`camlDune__exe__Main__frontend_678 at main.ml:30:4
    frame #13: 0x0000555555d9f427 cerberus`camlDune__exe__Main__fun_2618 at main.ml:156:6
    frame #14: 0x0000555555f178ac cerberus`camlCerb_frontend__Exception__except_foldlM_989 at exception.ml:69:18
    frame #15: 0x0000555555d9fb03 cerberus`camlDune__exe__Main__fun_2691 at main.ml:268:8
    frame #16: 0x0000555555d9ec45 cerberus`camlDune__exe__Main__cerberus_1366 at main.ml:263:8
    frame #17: 0x0000555556621ac8 cerberus`camlCmdliner_term__fun_652 at cmdliner_term.ml:24:19
    frame #18: 0x0000555556625107 cerberus`camlCmdliner_eval__run_parser_550 at cmdliner_eval.ml:35:37
    frame #19: 0x00005555566261b9 cerberus`camlCmdliner_eval__eval_value_inner_1726 at cmdliner_eval.ml:203:14
    frame #20: 0x0000555556626a64 cerberus`camlCmdliner_eval__eval$27_1489 at cmdliner_eval.ml:261:8
    frame #21: 0x0000555555da20bf cerberus`camlDune__exe__Main__entry at main.ml:519:17
    frame #22: 0x0000555555d97969 cerberus`caml_program + 9785
    frame #23: 0x00005555566c759d cerberus`caml_start_program + 73
    frame #24: 0x00005555566a62fc cerberus`caml_startup_common(argv=0x00007fffffffe598, pooling=<unavailable>) at startup_nat.c:160:9
    frame #25: 0x00005555566a634f cerberus`caml_startup [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:167:10
    frame #26: 0x00005555566a6348 cerberus`caml_startup(argv=<unavailable>) at startup_nat.c:172
    frame #27: 0x00005555566a6379 cerberus`caml_main(argv=<unavailable>) at startup_nat.c:179:3 [artificial]
    frame #28: 0x0000555555d95232 cerberus`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3
    frame #29: 0x00007ffff7be8083 libc.so.6`__libc_start_main(main=(cerberus`main at main.c:31:1), argc=2, argv=0x00007fffffffe598, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe588) at libc-start.c:308:16
    frame #30: 0x0000555555d9526e cerberus`_start + 46
dc-mak commented 2 weeks ago

This link won't work for most but placing here anyways: https://github.com/GaloisInc/VERSE-Toolchain/issues/106

peterohanley commented 2 weeks ago

Continuing the conversation here -- I carefully did not ask that \e work, just that the error message on a lexer error was better. It sounds like the error message on most lexer errors is better and there's just something about this specific one that avoids the error handling. In that case, well, it's not critical.

kmemarian commented 2 weeks ago

@dc-mak the issue comes from the lack of wildcard pattern in s_char_sequence. The following patch will give the invalid symbol error:

diff --git a/parsers/c/c_lexer.mll b/parsers/c/c_lexer.mll
index a96016473..5e866dd04 100644
--- a/parsers/c/c_lexer.mll
+++ b/parsers/c/c_lexer.mll
@@ -350,6 +350,8 @@ rule s_char_sequence = parse
         x :: xs }
   | '"'
       { [] }
+  | _
+      { raise (Error Errors.Cparser_invalid_symbol) }

 and magic flags start_of_comment = parse
   (* End of the magic comment *)

Though maybe we want a dedicated "invalid string litteral" error?

dc-mak commented 2 weeks ago

I swear I tried that. I will add a dedicated invalid string literal error.

dc-mak commented 2 weeks ago

I did try it. I just didn't do make install and make install_cn 🤦

dc-mak commented 2 weeks ago

Fixed by https://github.com/rems-project/cerberus/commit/ee950d4ddf9c2dcbc9587d3515e6ab49906f99a5