ocaml-community / sedlex

An OCaml lexer generator for Unicode
MIT License
235 stars 43 forks source link

Buffer channel input, return None at the end of partial reads. #77

Closed toots closed 1 year ago

toots commented 5 years ago

Finally came up with the proper fix for #45 !

This PR adds proper buffering on channel reads while also inserting a None entry at the end of each partial read. This allow the fill_buf_from_gen function to return without reaching blocking read on the buffer.

pmetzger commented 5 years ago

It is probably only necessary to be careful about blocking on interactive uses; for non-interactive uses it probably slows things down a bit (and that can be significant when compiling a large code base; often lexing is one of the most time-consuming steps). I wonder if we should have two separate functions for this. Several other lexical analyzer generators have specific options for interactive vs. non-interactive use. Perhaps @Drup has an opinion.

toots commented 5 years ago

A separate function could work too. I'm already doing that in the app itself.

Drup commented 5 years ago

@pmetzger I'm not so convinced that having two modes is important. We would need benchmarks to really make a decision.

pmetzger commented 5 years ago

@Drup A benchmark is probably a good idea.

pmetzger commented 5 years ago

@toots I noticed this one was still open. What should we do about it?

toots commented 5 years ago

Thanks for bringing this one up!

We have used this function in our application for a while and it's working pretty well. I say we could either merge it as it is or, if we can to be conservative with features, add those changes as a from_interactive_channel.

pmetzger commented 5 years ago

Do you think you could do a quick benchmark to see if we should bother with a distinct mode?

hhugo commented 1 year ago

Running a quick benchmark, it seems that this PR is actually faster than the implementation on master, probably due to the fact that we perform less c calls.

That said, I cannot convince myself that the implementation is correct. In particular, I don't understand how we can be sure that we don't return None at the very start of the refill call (which would look like an EOF)

hhugo commented 1 year ago

Also, I think we would have invalid results if a give-back-control None was returned in the middle of Utf8.Helper.from_gen.

hhugo commented 1 year ago

I have an alternative fix for #45, see #124

hhugo commented 1 year ago

This can be closed @toots

toots commented 1 year ago

Indeed! @hhugo do you have other pending changes? I'd say we should do a release soon now.

hhugo commented 1 year ago

I don't have other changes planned. I wanted to run some more tests with jsoo first maybe.

hhugo commented 1 year ago

@toots, I run tests with jsoo, all good. I've just open one last fix #126