Closed hannesm closed 9 months ago
note this could be split into pieces, such as introducing the xor_into_bytes function, then moving poly1305 to string, then adding macl
, then moving chacha to string. on a separate note, in C we directly cast to uint32_t *
to avoid shifting and reading byte-wise (this could also be done in a separate commit).
EDIT: ah, the big endian (s390x) fails due to that... fixed in the following commits
Neat! Thanks for that! Here are my results (I confirm the x2 improvement on smallest sizes, and improvement everywhere!):
with @reynir suggested changes 1e1d179
I wondered about the CI failure, and the Bytes.unsafe_blit_string is only available from OCaml 4.09 on (fine with me to bump the lower OCaml bound)
Performance improvement around 2.5x
Worth to gather some more statistics hereof, and the question how to move forward - I can see two ways:
WDYT?
main branch (28f8cde5ff3197e6383a935037725ab34ab32485) 16: 11.686102 MB/s (2318663 iters in 3.028 s) 64: 44.914143 MB/s (2248341 iters in 3.055 s) 256: 111.408313 MB/s (1464243 iters in 3.209 s) 1024: 190.362636 MB/s (530561 iters in 2.722 s) 8192: 235.326363 MB/s (81843 iters in 2.717 s)
this PR (with using the ad-hoc API enc_auth_str): 16: 26.288283 MB/s (5247580 iters in 3.046 s) 64: 99.922308 MB/s (4995212 iters in 3.051 s) 256: 204.388046 MB/s (2405283 iters in 2.873 s) 1024: 278.060783 MB/s (863503 iters in 3.033 s) 8192: 294.746967 MB/s (113552 iters in 3.010 s)