ocaml-community / sedlex

An OCaml lexer generator for Unicode
MIT License
239 stars 43 forks source link

Don't use gen to consume channels #124

Closed hhugo closed 1 year ago

hhugo commented 1 year ago

To solve #45, we would like to not make unnecessary blocking read on channels. One key function to solve this is input which returns the number of bytes read and doesn't block if some bytes are available in the buffer. In order to rely on input, we need to give up the current (short) implementation of from_channel that rely on from_gen

fix #45 replace #77

hhugo commented 1 year ago

Reading 100_000_000 zero bytes from /dev/zero before: 2.44 after: 1.23

hhugo commented 1 year ago

I've squashed commits. @smuenzel, do you have time to review some code ?

hhugo commented 1 year ago

I believe truncated utf8/utf16 were not properly recognized as malformed. My last commit https://github.com/ocaml-community/sedlex/pull/124/commits/5851233b8943d1c9ebe94192f974d452654eb8ee tries to fix this.

hhugo commented 1 year ago

I've added some tests push fixes

hhugo commented 1 year ago

I've rebased, squashed commits again, move test first so that we can see the effect of the change in the test.