int codepoints -> Uchar.t codepoints

ocaml-community / sedlex

An OCaml lexer generator for Unicode

MIT License

235 stars 43 forks source link

int codepoints -> Uchar.t codepoints #60

Closed pmetzger closed 6 years ago

pmetzger commented 6 years ago

These fixes are preliminary, and not yet ready to merge. There's a need for more tests, and the change made to the code generated by ppx_sedlex to handle the EOF case needs cleanup, as right now it generates duplicate code instead of creating an inner function.

Also still to do: possibly removing the locally maintained unicode handling code and replacing it with parts of dbuenzli's unicode libraries or some equivalent.

Drup commented 6 years ago

I like the general idea behind the change. If it's not too complicated, it would be nice to separate it into two parts: first get rid of eof = - 1, then switch to uchar.

About the duplicated code: Is it really a problem ? You will want the compiler to inline it anyway. At least now it's inlined.

pmetzger commented 6 years ago

I like the general idea behind the change. If it's not too complicated, it would be nice to separate it into two parts: first get rid of eof = - 1, then switch to uchar.

That's fairly easily done if you think it's needed, though if we want to do both it sort of seems like one commit means fewer breaking changes to the API? This is a breaking change after all. If we do it in two steps, it means breaking the API twice.

About the duplicated code: Is it really a problem ? You will want the compiler to inline it anyway. At least now it's inlined.

It might not be a problem, but I did it that way because I was having trouble getting the correct output using ppx and this was the easiest workaround. It felt like a gross hack. If you actually prefer it this way, though, I have no objection to it being integrated this way.

pmetzger commented 6 years ago

None of the examples need any modifications? If that's the case, could you add a new one that specifically exercise the new uchar handling?

Hadn't checked really. Will look at that.

pmetzger commented 6 years ago

I've fixed the issues you've mentioned so far.

I've confirmed that none of the examples need modifications. Thoughts on what a good one exercising this particular capability might be?

Drup commented 6 years ago

@pmetzger Something that have a normalization procedure with uunf before lexing ?

pmetzger commented 6 years ago

Let me sleep on that. I want to add a bunch of improvements to the library, so maybe in the course of providing those I can add examples with more interesting features. For now, maybe we just integrate these changes?

One thought, though: we need to bump the major version of the library when we integrate this.

Drup commented 6 years ago

Sure. And yes, we will have to bump the major number. That's fine, we can deal with that later. :)