ocaml-community / sedlex

An OCaml lexer generator for Unicode
MIT License
235 stars 43 forks source link

Make my lexer support foreign languages #104

Closed chengtie closed 3 years ago

chengtie commented 3 years ago

I have a lexer, which supports only English. Here is how a identifier is defined:

let first_latin_identifier_character = [%sedlex.regexp? ('a'..'z') | ('A'..'Z') ]
let subsequent_latin_identifier_character = [%sedlex.regexp? first_latin_identifier_character | '\x5F' (* underscore *) | decimal_digit] 
let latin_identifier = [%sedlex.regexp? first_latin_identifier_character, (Star subsequent_latin_identifier_character)]
let lex_identifier = [%sedlex.regexp? latin_identifier] 

Now I would like to support foreign languages such as French, Spanish, German and Chinese, I tried to add cn to first_latin_identifier_character and expected it would support Chinese:

let first_latin_identifier_character = [%sedlex.regexp? ('a'..'z') | ('A'..'Z') | cn]
...

Then, the compilation returned:

Fatal error: exception Stack overflow
File "sedlex/gen/sedlexer_e.ml", line 1:
Error: Error while running external preprocessor
Command line: /Users/chengtie/.opam/system/lib/sedlex/ppx/./ppx.exe --as-ppx '/var/folders/zw/lbg6_yj5175_sbkjj0ms61r00000gn/T/camlppxf91cde' '/var/folders/zw/lbg6_yj5175_sbkjj0ms61r00000gn/T/camlppx663c85'

Does anyone know how to make it work?

chengtie commented 3 years ago

I just realized that cn is not for Chinese, sorry about that.