Considering supporting OCaml

smasher164 commented 1 year ago

OCaml is quite popular in language development, and has a dearth of lexer generators with good unicode support. It's a bit different from the imperative languages that re2c currently supports, i.e. it doesn't support labelled break, continue, goto, or the ability to early return. However, it does have good support for tail-call optimization, exceptions, algebraic effects, and of course if-else (for the nested-ifs code generation).

Is there interest in supporting such a backend? If one were interested in contributing support, where should one look? src/codegen?

Thanks

skvadrik commented 1 year ago

OCaml is quite popular in language development, and has a dearth of lexer generators with good unicode support.

Do you think re2c will be able to provide some advantage over these? The main strengths of re2c is generating fast code, providing a flexible user interface and supporting submatch extraction based on TDFA. Do you see any of these features lacking in existing OCaml generators? Porting re2c to a functional languages looks like an interesting challenge, but I'm trying to understand if it will be useful in practice.

If one were interested in contributing support, where should one look?

First thing would be to understand what you are going to generate and construct the desired output by hand. Take a look at the standard re2c examples --- they represent various use cases and different modes of using re2c. Examples will be among the first programs ported to the new language, so it's good to have an idea of how to express each of them in OCaml.

Once you have understanding how this can be done, src/codegen is indeed the place to look. However, note that instead of hardcoding new languages in re2c, a better way is to support syntax files that would allow us to add language support in a less intrusive way. Sometimes people contribute incomplete support for new languages (e.g. lacking some of the examples / documentation), such as in the case of D, which had to be reverted .

smasher164 commented 1 year ago

Do you see any of these features lacking in existing OCaml generators?

ocamllex lacks unicode support is not reentrant does not have a drop-in UI
sedlex lacks submatch extraction does not have a drop-in UI

I've found the flexible UI to re2c to be the killer feature. Being able to define:YY... its primitives inline with the rest of my code makes it very convenient to use. This is even without considering the performance argument.

so it's good to have an idea of how to express each of them in OCaml

My gut feeling is to take advantage of OCaml's [@tailcall] annotation, so that you basically have access to computed GOTO, like in the C backend. Additionally, the new effect stuff in the runtime is supposedly pretty efficient, so you'd have access to arbitrary control flow at that point.

a better way is to support https://github.com/skvadrik/re2c/issues/450 that would allow us to add language support in a less intrusive way

I can mention this in that issue, but my worry is that the syntax file approach can no longer special-case behavior per-target. And while I understand you may not want to commit to a stable AST, there could be a third option. re2c can have its own internal IR for language-agnostic optimizations, and lower to a separate, stable IR that users can codegen from.

pmetzger commented 12 months ago

FWIW, I think supporting OCaml would be extremely cool. sedlex is pretty clunky, and re2c is much nicer. maybe if #450 were implemented it would be fairly straightforward to do this.

Btw, I disagree that OCaml lacks goto and the like. You can do such things with tail recursion, and because it has proper tail recursion, implementing fast finite automata in OCaml is actually easier than in (say) Rust. You just do a call that never returns rather than having an explicit goto.

pmetzger commented 10 months ago

@smasher164 Any interest in working on this yourself (or with assistance)?

skvadrik commented 10 months ago

FYI I'm slowly working on #450, which (if my experiment works out) should allow adding new backends with one config file.

Aside from that, the first thing to start for any new backend is to manually construct examples of the code that re2c should generate (look at the examples to get the idea of what should be covered).

pmetzger commented 10 months ago

I could probably manually create OCaml code examples if that would be of help for you in making #450 a reality. It's a very different language than the ones re2c currently supports so it might be a good test.

skvadrik commented 8 months ago

This is approximately what re2c needs to generate for the simplest example 01_basic.re (the match part, not the whole program):

open Printf

let rec lex s yystate cursor =
    match yystate with
        | 0 ->
            let yych = s.[cursor] in
            let c = cursor + 1 in
            (match yych with
                | '1'..'9' -> lex s 2 c
                | _ -> lex s 1 c)
        | 1 -> false
        | 2 ->
            let yych = s.[cursor] in
            (match yych with
                | '0'..'9' ->
                    let c = cursor + 1 in
                    lex s 2 c
                | _ -> lex s 3 cursor)
        | 3 -> true
        | _ -> raise (Failure "internal lexer error")

let main () =
    if not (lex "1234\x00" 0 0) then raise (Failure "error")

let _ = main ()

It is based on the loop/switch mode with recursive function call instead of the loop. We could probably use goto/label mode as well with every state as a separate function and goto replaced with a function call, but I think it would require more changes to codegen. For the above, we'll likely need another generic API primitive for the recursive invocation, as the user should be free to decide the signature of the lex function. Things like let c = cursor + 1 in and let yych = s.[cursor] in map directly to YYSKIP and YYPEEK respectively, which are also defined by the user.

It should be possible to use this new codegen mode with other languages, as they all have recursive functions.

pmetzger commented 8 months ago

The most natural way to represent a state transition in OCaml is with a tail recursive function call; it will be much faster than most of the other options.

smasher164 commented 8 months ago

To echo @pmetzger, the way I would approach codegen here would be to just use tail recursive calls, and treat them like you treat goto in the generated C or Go code. For example, the direct translation would yield something like (you can omit type annotations if you want)

let lex(s: string): bool =
    let cursor : int ref = ref 0 in
    let yych : char ref = ref s.[!cursor] in
    let rec yy1(): bool =
        cursor := !cursor + 1;
        false
    and yy2(): bool =
        cursor := !cursor + 1;
        yych := s.[!cursor];
        (match !yych with
        | '0'..'9' -> (yy2[@tailcall])()
        | _ -> (yy3[@tailcall])())
    and yy3(): bool = true in
    match !yych with
    | '1'..'9' -> (yy2[@tailcall])()
    | _ -> (yy1[@tailcall])()

We have an outer lex function that sets up some mutable state with ref. We declare yy1, yy2, and yy3 as local closures, so they have access to this state. They are all tail-recursive, i.e. the last thing they do is either return a value or invoke another function. The invocations to (yy1[@tailcall])(), (yy2[@tailcall])(), and (yy3[@tailcall])() are tail calls and the [@tailcall] attribute will tell the ocaml compiler to optimize them into jumps, and warn if the functions don't meet the criteria (like if they're not tail recursive).

So basically,

Wherever you emit a label, emit a function.
Wherever you emit a goto, emit a [@tailcall].

skvadrik commented 8 months ago

Note that the lex function in my comment above is also tail-recursive (you can replace lex with (lex [@tailcall]), only there is one big tail-recursive function instead of many small ones. It is possible to go with many small functions (one for each state) if the new codegen mode is based on goto/label model, but it will probably require more work than adapting loop/switch model. Is there any clear advantage in one approach over the other?

smasher164 commented 8 months ago

If your approach will require less work during codegen, then I think it makes sense to go down that route. Tbh I just assumed that the goto/label model was more convenient for you.

pmetzger commented 7 months ago

The advantage of the "tail recursive function per state" approach is that it's going to have the highest possible performance. The code generator will turn it all into gotos (and then jump instructions in machine language.) This would also be the case for other functional languages like Haskell. Using the tailcall annotation in OCaml is not required, btw, but it will assure that the compiler will get angry if the call is not, in fact, tail recursive, thus avoiding mistakes.

skvadrik commented 7 months ago

Right, I also thought about performance. Trying to compare the generated code:

Program 1.ml uses recursive closures:

open Printf

type state = {
    str: string;
    mutable cur: int;
}

let lex st =
    let rec yy0() =
            let yych = st.str.[st.cur] in
            st.cur <- st.cur + 1;
            (match yych with
                | '1'..'9' -> (yy2 [@tailcall]) ()
                | _ -> (yy1 [@tailcall]) ())
    and yy1() = false
    and yy2() = 
            let yych = st.str.[st.cur] in
            (match yych with
                | '0'..'9' ->
                    st.cur <- st.cur + 1;
                    (yy2 [@tailcall]) ()
                | _ -> (yy3 [@tailcall]) ())
    and yy3() = true
    in yy0 ()

let main () =
    let st = { str = "1234\x00"; cur = 0; } in
    if not (lex st) then raise (Failure "error")

let _ = main ()

Program 2.ml uses one recursive function with a match on yystate:

open Printf

type state = {
    str: string;
    mutable cur: int;
    mutable state: int;
}

let rec lex st =
    let yystate = st.state in
    match yystate with
        | 0 ->
            let yych = st.str.[st.cur] in
            st.cur <- st.cur + 1;
            (match yych with
                | '1'..'9' -> st.state <- 2; (lex [@tailcall]) st
                | _ -> st.state <- 1; (lex [@tailcall]) st)
        | 1 -> false
        | 2 ->
            let yych = st.str.[st.cur] in
            (match yych with
                | '0'..'9' ->
                    st.cur <- st.cur + 1;
                    st.state <- 2; (lex [@tailcall]) st
                | _ -> st.state <- 3; (lex [@tailcall]) st)
        | 3 -> true
        | _ -> raise (Failure "internal lexer error")

let main () =
    let st = { str = "1234\x00"; cur = 0; state = 0; } in
    if not (lex st) then raise (Failure "error")

let _ = main ()

Let's see what instructions are generated by ocamlopt:

$ ocamlopt 1.ml && objdump -d 1.o && objdump -dr 1.o

...
0000000000000000 <caml1__lex_283>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   49 83 ef 68             sub    $0x68,%r15
   8:   4d 3b 3e                cmp    (%r14),%r15
   b:   0f 82 99 00 00 00       jb     aa <caml1__lex_283+0xaa>
  11:   49 8d 5f 08             lea    0x8(%r15),%rbx
  15:   48 c7 43 f8 f7 30 00    movq   $0x30f7,-0x8(%rbx)
  1c:   00 
  1d:   48 8b 3d 00 00 00 00    mov    0x0(%rip),%rdi        # 24 <caml1__lex_283+0x24>
                        20: R_X86_64_REX_GOTPCRELX      caml1__yy0_286-0x4
  24:   48 89 3b                mov    %rdi,(%rbx)
  27:   48 bf 17 00 00 00 00    movabs $0x100000000000017,%rdi
  2e:   00 00 01 
  31:   48 89 7b 08             mov    %rdi,0x8(%rbx)
  35:   48 c7 43 10 f9 0c 00    movq   $0xcf9,0x10(%rbx)
  3c:   00 
  3d:   48 8b 3d 00 00 00 00    mov    0x0(%rip),%rdi        # 44 <caml1__lex_283+0x44>
                        40: R_X86_64_REX_GOTPCRELX      caml1__yy1_287-0x4
  44:   48 89 7b 18             mov    %rdi,0x18(%rbx)
  48:   48 bf 11 00 00 00 00    movabs $0x100000000000011,%rdi
  4f:   00 00 01 
  52:   48 89 7b 20             mov    %rdi,0x20(%rbx)
  56:   48 c7 43 28 f9 18 00    movq   $0x18f9,0x28(%rbx)
  5d:   00 
  5e:   48 8b 3d 00 00 00 00    mov    0x0(%rip),%rdi        # 65 <caml1__lex_283+0x65>
                        61: R_X86_64_REX_GOTPCRELX      caml1__yy2_288-0x4
  65:   48 89 7b 30             mov    %rdi,0x30(%rbx)
  69:   48 bf 0b 00 00 00 00    movabs $0x10000000000000b,%rdi
  70:   00 00 01 
  73:   48 89 7b 38             mov    %rdi,0x38(%rbx)
  77:   48 c7 43 40 f9 24 00    movq   $0x24f9,0x40(%rbx)
  7e:   00 
  7f:   48 8b 3d 00 00 00 00    mov    0x0(%rip),%rdi        # 86 <caml1__lex_283+0x86>
                        82: R_X86_64_REX_GOTPCRELX      caml1__yy3_289-0x4
  86:   48 89 7b 48             mov    %rdi,0x48(%rbx)
  8a:   48 bf 05 00 00 00 00    movabs $0x100000000000005,%rdi
  91:   00 00 01 
  94:   48 89 7b 50             mov    %rdi,0x50(%rbx)
  98:   48 89 43 58             mov    %rax,0x58(%rbx)
  9c:   b8 01 00 00 00          mov    $0x1,%eax
  a1:   48 83 c4 08             add    $0x8,%rsp
  a5:   e9 00 00 00 00          jmp    aa <caml1__lex_283+0xaa>
                        a6: R_X86_64_PLT32      caml1__yy0_286-0x4
  aa:   e8 00 00 00 00          call   af <caml1__lex_283+0xaf>
                        ab: R_X86_64_PLT32      caml_call_gc-0x4
  af:   e9 5d ff ff ff          jmp    11 <caml1__lex_283+0x11>
  b4:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
  bb:   00 00 00 00 
  bf:   90                      nop

00000000000000c0 <caml1__yy0_286>:
  c0:   48 83 ec 08             sub    $0x8,%rsp
  c4:   4d 3b 3e                cmp    (%r14),%r15
  c7:   76 71                   jbe    13a <caml1__yy0_286+0x7a>
  c9:   48 8b 7b 58             mov    0x58(%rbx),%rdi
  cd:   48 8b 47 08             mov    0x8(%rdi),%rax
  d1:   48 89 c6                mov    %rax,%rsi
  d4:   48 d1 fe                sar    %rsi
  d7:   48 8b 17                mov    (%rdi),%rdx
  da:   48 8b 4a f8             mov    -0x8(%rdx),%rcx
  de:   48 c1 e9 0a             shr    $0xa,%rcx
  e2:   48 8d 0c cd ff ff ff    lea    -0x1(,%rcx,8),%rcx
  e9:   ff 
  ea:   4c 0f b6 04 0a          movzbq (%rdx,%rcx,1),%r8
  ef:   4c 29 c1                sub    %r8,%rcx
  f2:   48 39 f1                cmp    %rsi,%rcx
  f5:   76 4a                   jbe    141 <caml1__yy0_286+0x81>
  f7:   48 0f b6 34 32          movzbq (%rdx,%rsi,1),%rsi
  fc:   48 8d 74 36 01          lea    0x1(%rsi,%rsi,1),%rsi
 101:   48 83 c0 02             add    $0x2,%rax
 105:   48 89 47 08             mov    %rax,0x8(%rdi)
 109:   48 83 c6 9e             add    $0xffffffffffffff9e,%rsi
 10d:   48 83 fe 11             cmp    $0x11,%rsi
 111:   76 15                   jbe    128 <caml1__yy0_286+0x68>
 113:   48 83 c3 18             add    $0x18,%rbx
 117:   b8 01 00 00 00          mov    $0x1,%eax
 11c:   48 83 c4 08             add    $0x8,%rsp
 120:   e9 00 00 00 00          jmp    125 <caml1__yy0_286+0x65>
                        121: R_X86_64_PLT32     caml1__yy1_287-0x4
 125:   0f 1f 00                nopl   (%rax)
 128:   48 83 c3 30             add    $0x30,%rbx
 12c:   b8 01 00 00 00          mov    $0x1,%eax
 131:   48 83 c4 08             add    $0x8,%rsp
 135:   e9 00 00 00 00          jmp    13a <caml1__yy0_286+0x7a>
                        136: R_X86_64_PLT32     caml1__yy2_288-0x4
 13a:   e8 00 00 00 00          call   13f <caml1__yy0_286+0x7f>
                        13b: R_X86_64_PLT32     caml_call_gc-0x4
 13f:   eb 88                   jmp    c9 <caml1__yy0_286+0x9>
 141:   e8 00 00 00 00          call   146 <caml1__yy0_286+0x86>
                        142: R_X86_64_PLT32     caml_ml_array_bound_error-0x4
 146:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
 14d:   00 00 00 

0000000000000150 <caml1__yy1_287>:
 150:   b8 01 00 00 00          mov    $0x1,%eax
 155:   c3                      ret
 156:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
 15d:   00 00 00 

0000000000000160 <caml1__yy2_288>:
 160:   48 83 ec 08             sub    $0x8,%rsp
 164:   4d 3b 3e                cmp    (%r14),%r15
 167:   76 66                   jbe    1cf <caml1__yy2_288+0x6f>
 169:   48 8b 7b 28             mov    0x28(%rbx),%rdi
 16d:   48 8b 47 08             mov    0x8(%rdi),%rax
 171:   48 89 c6                mov    %rax,%rsi
 174:   48 d1 fe                sar    %rsi
 177:   48 8b 17                mov    (%rdi),%rdx
 17a:   48 8b 4a f8             mov    -0x8(%rdx),%rcx
 17e:   48 c1 e9 0a             shr    $0xa,%rcx
 182:   48 8d 0c cd ff ff ff    lea    -0x1(,%rcx,8),%rcx
 189:   ff 
 18a:   4c 0f b6 04 0a          movzbq (%rdx,%rcx,1),%r8
 18f:   4c 29 c1                sub    %r8,%rcx
 192:   48 39 f1                cmp    %rsi,%rcx
 195:   76 3f                   jbe    1d6 <caml1__yy2_288+0x76>
 197:   48 0f b6 34 32          movzbq (%rdx,%rsi,1),%rsi
 19c:   48 8d 74 36 01          lea    0x1(%rsi,%rsi,1),%rsi
 1a1:   48 83 c6 a0             add    $0xffffffffffffffa0,%rsi
 1a5:   48 83 fe 13             cmp    $0x13,%rsi
 1a9:   76 15                   jbe    1c0 <caml1__yy2_288+0x60>
 1ab:   48 83 c3 18             add    $0x18,%rbx
 1af:   b8 01 00 00 00          mov    $0x1,%eax
 1b4:   48 83 c4 08             add    $0x8,%rsp
 1b8:   e9 00 00 00 00          jmp    1bd <caml1__yy2_288+0x5d>
                        1b9: R_X86_64_PLT32     caml1__yy3_289-0x4
 1bd:   0f 1f 00                nopl   (%rax)
 1c0:   48 83 c0 02             add    $0x2,%rax
 1c4:   48 89 47 08             mov    %rax,0x8(%rdi)
 1c8:   b8 01 00 00 00          mov    $0x1,%eax
 1cd:   eb 95                   jmp    164 <caml1__yy2_288+0x4>
 1cf:   e8 00 00 00 00          call   1d4 <caml1__yy2_288+0x74>
                        1d0: R_X86_64_PLT32     caml_call_gc-0x4
 1d4:   eb 93                   jmp    169 <caml1__yy2_288+0x9>
 1d6:   e8 00 00 00 00          call   1db <caml1__yy2_288+0x7b>
                        1d7: R_X86_64_PLT32     caml_ml_array_bound_error-0x4
 1db:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

00000000000001e0 <caml1__yy3_289>:
 1e0:   b8 03 00 00 00          mov    $0x3,%eax
 1e5:   c3                      ret
 1e6:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
 1ed:   00 00 00

Here's the second one:

$ ocamlopt 2.ml && objdump -d 2.o && objdump -dr 2.o
...
0000000000000000 <caml2__lex_284>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   48 89 c3                mov    %rax,%rbx
   7:   4d 3b 3e                cmp    (%r14),%r15
   a:   0f 86 54 01 00 00       jbe    164 <caml2__lex_284+0x164>
  10:   48 8b 43 10             mov    0x10(%rbx),%rax
  14:   48 83 f8 07             cmp    $0x7,%rax
  18:   76 42                   jbe    5c <caml2__lex_284+0x5c>
  1a:   49 83 ef 18             sub    $0x18,%r15
  1e:   4d 3b 3e                cmp    (%r14),%r15
  21:   0f 82 33 01 00 00       jb     15a <caml2__lex_284+0x15a>
  27:   49 8d 47 08             lea    0x8(%r15),%rax
  2b:   48 c7 40 f8 00 08 00    movq   $0x800,-0x8(%rax)
  32:   00 
  33:   48 8b 1d 00 00 00 00    mov    0x0(%rip),%rbx        # 3a <caml2__lex_284+0x3a>
                        36: R_X86_64_REX_GOTPCRELX      camlStdlib-0x4
  3a:   48 8b 5b 30             mov    0x30(%rbx),%rbx
  3e:   48 89 18                mov    %rbx,(%rax)
  41:   48 8b 1d 00 00 00 00    mov    0x0(%rip),%rbx        # 48 <caml2__lex_284+0x48>
                        44: R_X86_64_REX_GOTPCRELX      caml2__1-0x4
  48:   48 89 58 08             mov    %rbx,0x8(%rax)
  4c:   49 8b 66 10             mov    0x10(%r14),%rsp
  50:   41 8f 46 10             pop    0x10(%r14)
  54:   41 5b                   pop    %r11
  56:   41 ff e3                jmp    *%r11
  59:   0f 1f 00                nopl   (%rax)
  5c:   48 d1 f8                sar    %rax
  5f:   48 8d 15 00 00 00 00    lea    0x0(%rip),%rdx        # 66 <caml2__lex_284+0x66>
                        62: R_X86_64_PC32       .rodata-0x4
  66:   48 63 04 82             movslq (%rdx,%rax,4),%rax
  6a:   48 01 c2                add    %rax,%rdx
  6d:   ff e2                   jmp    *%rdx
  6f:   90                      nop
  70:   48 8b 43 08             mov    0x8(%rbx),%rax
  74:   48 89 c7                mov    %rax,%rdi
  77:   48 d1 ff                sar    %rdi
  7a:   48 8b 33                mov    (%rbx),%rsi
  7d:   48 8b 56 f8             mov    -0x8(%rsi),%rdx
  81:   48 c1 ea 0a             shr    $0xa,%rdx
  85:   48 8d 14 d5 ff ff ff    lea    -0x1(,%rdx,8),%rdx
  8c:   ff 
  8d:   48 0f b6 0c 16          movzbq (%rsi,%rdx,1),%rcx
  92:   48 29 ca                sub    %rcx,%rdx
  95:   48 39 fa                cmp    %rdi,%rdx
  98:   0f 86 d0 00 00 00       jbe    16e <caml2__lex_284+0x16e>
  9e:   48 0f b6 3c 3e          movzbq (%rsi,%rdi,1),%rdi
  a3:   48 8d 7c 3f 01          lea    0x1(%rdi,%rdi,1),%rdi
  a8:   48 83 c0 02             add    $0x2,%rax
  ac:   48 89 43 08             mov    %rax,0x8(%rbx)
  b0:   48 83 c7 9e             add    $0xffffffffffffff9e,%rdi
  b4:   48 83 ff 11             cmp    $0x11,%rdi
  b8:   76 12                   jbe    cc <caml2__lex_284+0xcc>
  ba:   48 c7 43 10 03 00 00    movq   $0x3,0x10(%rbx)
  c1:   00 
  c2:   48 89 d8                mov    %rbx,%rax
  c5:   e9 3a ff ff ff          jmp    4 <caml2__lex_284+0x4>
  ca:   66 90                   xchg   %ax,%ax
  cc:   48 c7 43 10 05 00 00    movq   $0x5,0x10(%rbx)
  d3:   00 
  d4:   48 89 d8                mov    %rbx,%rax
  d7:   e9 28 ff ff ff          jmp    4 <caml2__lex_284+0x4>
  dc:   b8 01 00 00 00          mov    $0x1,%eax
  e1:   48 83 c4 08             add    $0x8,%rsp
  e5:   c3                      ret
  e6:   66 90                   xchg   %ax,%ax
  e8:   48 8b 43 08             mov    0x8(%rbx),%rax
  ec:   48 89 c7                mov    %rax,%rdi
  ef:   48 d1 ff                sar    %rdi
  f2:   48 8b 33                mov    (%rbx),%rsi
  f5:   48 8b 56 f8             mov    -0x8(%rsi),%rdx
  f9:   48 c1 ea 0a             shr    $0xa,%rdx
  fd:   48 8d 14 d5 ff ff ff    lea    -0x1(,%rdx,8),%rdx
 104:   ff 
 105:   48 0f b6 0c 16          movzbq (%rsi,%rdx,1),%rcx
 10a:   48 29 ca                sub    %rcx,%rdx
 10d:   48 39 fa                cmp    %rdi,%rdx
 110:   76 5c                   jbe    16e <caml2__lex_284+0x16e>
 112:   48 0f b6 3c 3e          movzbq (%rsi,%rdi,1),%rdi
 117:   48 8d 7c 3f 01          lea    0x1(%rdi,%rdi,1),%rdi
 11c:   48 83 c7 a0             add    $0xffffffffffffffa0,%rdi
 120:   48 83 ff 13             cmp    $0x13,%rdi
 124:   76 12                   jbe    138 <caml2__lex_284+0x138>
 126:   48 c7 43 10 07 00 00    movq   $0x7,0x10(%rbx)
 12d:   00 
 12e:   48 89 d8                mov    %rbx,%rax
 131:   e9 ce fe ff ff          jmp    4 <caml2__lex_284+0x4>
 136:   66 90                   xchg   %ax,%ax
 138:   48 83 c0 02             add    $0x2,%rax
 13c:   48 89 43 08             mov    %rax,0x8(%rbx)
 140:   48 c7 43 10 05 00 00    movq   $0x5,0x10(%rbx)
 147:   00 
 148:   48 89 d8                mov    %rbx,%rax
 14b:   e9 b4 fe ff ff          jmp    4 <caml2__lex_284+0x4>
 150:   b8 03 00 00 00          mov    $0x3,%eax
 155:   48 83 c4 08             add    $0x8,%rsp
 159:   c3                      ret
 15a:   e8 00 00 00 00          call   15f <caml2__lex_284+0x15f>
                        15b: R_X86_64_PLT32     caml_call_gc-0x4
 15f:   e9 c3 fe ff ff          jmp    27 <caml2__lex_284+0x27>
 164:   e8 00 00 00 00          call   169 <caml2__lex_284+0x169>
                        165: R_X86_64_PLT32     caml_call_gc-0x4
 169:   e9 a2 fe ff ff          jmp    10 <caml2__lex_284+0x10>
 16e:   e8 00 00 00 00          call   173 <caml2__lex_284+0x173>
                        16f: R_X86_64_PLT32     caml_ml_array_bound_error-0x4
 173:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
 17a:   00 00 00 00 
 17e:   66 90                   xchg   %ax,%ax

re2c can generate both examples. Which one is faster?

I haven't measured it yet.

pmetzger commented 7 months ago

Will be curious about your measurements. Also, which compiler are you using? flambda will be more aggressive here.

skvadrik commented 7 months ago

Some measurements with ocamlopt:

a.ml:

type state = {
    str: string;
    mutable cur: int;
}

let lex st =
    let rec yy0() =
            let yych = st.str.[st.cur] in
            st.cur <- st.cur + 1;
            (match yych with
                | '1'..'9' -> (yy2 [@tailcall]) ()
                | _ -> (yy1 [@tailcall]) ())
    and yy1() = false
    and yy2() = 
            let yych = st.str.[st.cur] in
            (match yych with
                | '0'..'9' ->
                    st.cur <- st.cur + 1;
                    (yy2 [@tailcall]) ()
                | _ -> (yy3 [@tailcall]) ())
    and yy3() = true
    in yy0 ()

let main () =
    let s = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890\x00" in
    for i = 1 to 100000000 do
        let st = { str = s; cur = 0; } in
        if not (lex st) then raise (Failure "error")
    done

let _ = main ()

b.ml:

type state = {
    str: string;
    mutable cur: int;
    mutable state: int;
}

let rec lex st =
    let yystate = st.state in
    match yystate with
        | 0 ->
            let yych = st.str.[st.cur] in
            st.cur <- st.cur + 1;
            (match yych with
                | '1'..'9' -> st.state <- 2; (lex [@tailcall]) st
                | _ -> st.state <- 1; (lex [@tailcall]) st)
        | 1 -> false
        | 2 ->
            let yych = st.str.[st.cur] in
            (match yych with
                | '0'..'9' ->
                    st.cur <- st.cur + 1;
                    st.state <- 2; (lex [@tailcall]) st
                | _ -> st.state <- 3; (lex [@tailcall]) st)
        | 3 -> true
        | _ -> raise (Failure "internal lexer error")

let main () =
    let s = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890\x00" in
    for i = 1 to 100000000 do
        let st = { str = s; cur = 0; state = 0 } in
        if not (lex st) then raise (Failure "error")
    done

let _ = main ()

Compiled as:

$ ocamlopt a.ml -o a
$ ocamlopt b.ml -o b

Run time:

$ time ./a

real    0m16.163s
user    0m16.162s
sys     0m0.001s

$ time ./b

real    0m22.179s
user    0m22.178s
sys     0m0.000s

So, many recursive closures are considerably faster in this example.

skvadrik commented 7 months ago

perf stat on the same binaries shows more instructions and in particular more branches for b (which should be explained by the one extra conditional jump on yystate):

$ perf stat ./a

 Performance counter stats for './a':

         16,115.76 msec task-clock:u                     #    1.000 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
               609      page-faults:u                    #   37.789 /sec                      
    64,101,457,390      cycles:u                         #    3.978 GHz                       
   237,516,872,339      instructions:u                   #    3.71  insn per cycle            
    41,103,312,263      branches:u                       #    2.551 G/sec                     
       100,133,270      branch-misses:u                  #    0.24% of all branches           

      16.116135057 seconds time elapsed

      16.115046000 seconds user
       0.001000000 seconds sys

$ perf stat ./b

 Performance counter stats for './b':

         22,093.66 msec task-clock:u                     #    1.000 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
               605      page-faults:u                    #   27.383 /sec                      
    87,899,791,871      cycles:u                         #    3.979 GHz                       
   326,339,866,047      instructions:u                   #    3.71  insn per cycle            
    61,404,372,905      branches:u                       #    2.779 G/sec                     
       100,058,371      branch-misses:u                  #    0.16% of all branches           

      22.096008485 seconds time elapsed

      22.090730000 seconds user
       0.002999000 seconds sys

skvadrik commented 7 months ago

ocamlopt built with flambda support and -O3 gives approximately the same results, only both binaries are slightly faster (15.400s vs 21.624s).

pmetzger commented 7 months ago

I would have expected both to be a bit faster; somewhat surprised that it isn't more than that, but perhaps this isn't something flambda is particularly good at optimizing.

Anyway, I suppose this means that as suspected, the "goto" version is faster than the "big switch" version, and I suspect that the difference would be even bigger for very large state machines.

pmetzger commented 7 months ago

("goto" in this case being a reference to "Lambda: The Ultimate Goto". Tail recursion is by far my favorite goto.)

skvadrik commented 5 months ago

OCaml support has been added in experimental branch syntax-files: https://github.com/skvadrik/re2c/commit/c1ccefacaf35427dffe8c43f82a534adbee49225. This is not the final release yet, some things may still change. See examples.

E.g. for the basic example:

(* re2ocaml $INPUT -o $OUTPUT -i *)

type state = {
    str: string;
    mutable cur: int;
}

/*!re2c
    re2c:define:YYFN    = ["lex;bool", "st;state"];
    re2c:define:YYCTYPE = int;
    re2c:define:YYPEEK  = "Char.code st.str.[st.cur]";
    re2c:define:YYSKIP  = "st.cur <- st.cur + 1;";
    re2c:yyfill:enable  = 0;

    number = [1-9][0-9]*;

    number { true }
    *      { false }
*/

let main () =
    let st = {str = "1234\x00"; cur = 0}
    in if not (lex st) then raise (Failure "error")

let _ = main ()

re2ocaml generates the following code:

(* Generated by re2c *)
(* re2ocaml $INPUT -o $OUTPUT -i *)

type state = {
    str: string;
    mutable cur: int;
}

let rec yy0 (st : state) : bool =
    let yych = Char.code st.str.[st.cur] in
    st.cur <- st.cur + 1;
    match yych with
        | 0x31|0x32|0x33|0x34|0x35|0x36|0x37|0x38|0x39 -> (yy2 [@tailcall]) st
        | _ -> (yy1 [@tailcall]) st

and yy1 (st : state) : bool =
    false

and yy2 (st : state) : bool =
    let yych = Char.code st.str.[st.cur] in
    match yych with
        | 0x30|0x31|0x32|0x33|0x34|0x35|0x36|0x37|0x38|0x39 ->
            st.cur <- st.cur + 1;
            (yy2 [@tailcall]) st
        | _ -> (yy3 [@tailcall]) st

and yy3 (st : state) : bool =
    true

and lex (st : state) : bool =
    (yy0 [@tailcall]) st

let main () =
    let st = {str = "1234\x00"; cur = 0}
    in if not (lex st) then raise (Failure "error")

let _ = main ()

smasher164 commented 5 months ago

That's amazing @skvadrik, I'm excited to play with this and get back to you!

pmetzger commented 5 months ago

@skvadrik Should I publicize this among the OCaml community? I think some folks would find it interesting.

skvadrik commented 5 months ago

@skvadrik Should I publicize this among the OCaml community? I think some folks would find it interesting.

Sounds great, but let's wait until the API is finalized. I think I'll implement a few more language backends via syntax configs, and I also plan to tweak a few things in OCaml backend. I don't expect major changes though.

pmetzger commented 5 months ago

Python might be interesting, FWIW.

skvadrik / re2c

Considering supporting OCaml #449