wip: nested write by writing backward

c-cube commented 2 years ago

this is experimental, and comes from discussions with @vphantom and others about how to efficiently deal with nested messages. Writing backwards is a clean way of doing it, but so far it seems quite subtle and not worth it.

edit:

this depends on #167
I just tried to cache the sub-buffers used to encode nested messages, and this can yield some benefits on largish nested messages.

c-cube commented 2 years ago

so far:

~/w/ocaml-protoc (wip-nested-write-by-writing-backward|✚1) $ ./benchs.sh -p nested.enc.10
*** Run benchmarks for path "nested.enc.10"

Throughputs for "nested-enc-basic-buffer", "nested-enc-nested-bufs", "nested-enc-write-backward" each running 4 times for at least 3 CPU seconds:
  nested-enc-basic-buffer:  3.83 WALL ( 3.79 usr +  0.01 sys =  3.80 CPU) @ 33225.06/s (n=126152)
                            4.15 WALL ( 4.04 usr +  0.08 sys =  4.11 CPU) @ 30675.27/s (n=126152)
                            4.68 WALL ( 4.47 usr +  0.15 sys =  4.62 CPU) @ 27299.52/s (n=126152)
                            4.32 WALL ( 4.32 usr +  0.00 sys =  4.32 CPU) @ 29227.15/s (n=126152)
   nested-enc-nested-bufs:  3.06 WALL ( 3.04 usr +  0.00 sys =  3.04 CPU) @ 31951.89/s (n=96990)
                            3.05 WALL ( 3.02 usr +  0.00 sys =  3.02 CPU) @ 33486.48/s (n=101086)
                            3.01 WALL ( 3.00 usr +  0.00 sys =  3.00 CPU) @ 35043.27/s (n=105177)
                            3.01 WALL ( 3.00 usr +  0.00 sys =  3.00 CPU) @ 35439.61/s (n=106462)
nested-enc-write-backward:  3.14 WALL ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 28495.00/s (n=89302)
                            3.14 WALL ( 3.14 usr +  0.00 sys =  3.14 CPU) @ 27197.97/s (n=85344)
                            3.13 WALL ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 27603.14/s (n=86272)
                            3.13 WALL ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 27475.33/s (n=85938)
                             Rate       nested-enc-write-backward nested-enc-basic-buffer nested-enc-nested-bufs
nested-enc-write-backward 27693+- 461/s                        --                   [-8%]                   -19%
  nested-enc-basic-buffer 30107+-2054/s                      [9%]                      --                   -11%
   nested-enc-nested-bufs 33980+-1311/s                       23%                     13%                     --
~/w/ocaml-protoc (wip-nested-write-by-writing-backward|✚1) $ ./benchs.sh -p nested.enc.5
*** Run benchmarks for path "nested.enc.5"

Throughputs for "nested-enc-basic-buffer", "nested-enc-nested-bufs", "nested-enc-write-backward" each running 4 times for at least 3 CPU seconds:
  nested-enc-basic-buffer:  3.01 WALL ( 3.00 usr +  0.00 sys =  3.01 CPU) @ 60021.08/s (n=180387)
                            3.13 WALL ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 59669.38/s (n=186588)
                            3.13 WALL ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 53907.76/s (n=168603)
                            3.01 WALL ( 3.00 usr +  0.00 sys =  3.00 CPU) @ 56135.23/s (n=168603)
   nested-enc-nested-bufs:  3.21 WALL ( 3.20 usr +  0.00 sys =  3.20 CPU) @ 71289.96/s (n=227909)
                            3.17 WALL ( 3.15 usr +  0.00 sys =  3.15 CPU) @ 72363.71/s (n=227909)
                            3.35 WALL ( 3.32 usr +  0.00 sys =  3.32 CPU) @ 68573.32/s (n=227909)
                            3.52 WALL ( 3.48 usr +  0.00 sys =  3.48 CPU) @ 65399.66/s (n=227909)
nested-enc-write-backward:  3.03 WALL ( 3.01 usr +  0.00 sys =  3.01 CPU) @ 51838.81/s (n=156198)
                            3.13 WALL ( 3.10 usr +  0.00 sys =  3.10 CPU) @ 50321.08/s (n=156198)
                            3.02 WALL ( 3.02 usr +  0.00 sys =  3.02 CPU) @ 55913.69/s (n=168815)
                            3.13 WALL ( 3.12 usr +  0.00 sys =  3.12 CPU) @ 57412.65/s (n=179201)
                             Rate       nested-enc-write-backward nested-enc-basic-buffer nested-enc-nested-bufs
nested-enc-write-backward 53872+-2747/s                        --                   [-6%]                   -22%
  nested-enc-basic-buffer 57433+-2413/s                      [7%]                      --                   -17%
   nested-enc-nested-bufs 69407+-2560/s                       29%                     21%                     --

there's definitely a nice little edge to the "current encoding, but re-using buffers for nested messages"

vphantom commented 2 years ago

(Our earlier discussion: #161 )

c-cube commented 2 years ago

and https://github.com/mransan/ocaml-protoc/pull/157 for the nested-enc-nested-bufs bit, which seems to be quite nice actually. If we just keep the encoder around, it seems possible to serialize a ton of stuff with few allocations.

c-cube commented 2 years ago

seems to not pay off, and it makes codegen harder.

mransan / ocaml-protoc

wip: nested write by writing backward #169