skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.07k stars 169 forks source link

Bad codepoint range #423

Closed ccleve closed 1 year ago

ccleve commented 1 year ago

I'm trying to compile the exact example here:

https://re2c.org/manual/manual_rust.html#encoding-support

I've copied unicode_categories.re into the test directory and attempted to compile the code using re2rust. I get this error:

tests/pipeline/tokenizers/re2c/unicode_categories.re:2:68: error: bad code point range: '0xF8 - 0x2C1'

What am I doing wrong?

Also: I see that unicode_categories.re is three years old at this point. Should it be regenerated with a more recent version of unicode?

skvadrik commented 1 year ago

What am I doing wrong?

Can you provide you command line? Did you forget --utf8 argument?

Also: I see that unicode_categories.re is three years old at this point. Should it be regenerated with a more recent version of unicode?

Yes, it should. We even had a project for rewriting the generator (https://github.com/skvadrik/re2c/issues/235#issuecomment-605680011) but that somehow got stuck.

ccleve commented 1 year ago

Yes, I did forget the --ut8 argument. Thank you, works now.

I'll take a look at the regeneration code. Maybe I can help.