Closed jeromesimeon closed 3 months ago
I'm afraid no attention to Unicode was paid so far in the implementation of coq2html! Indeed, the regexp for identifiers https://github.com/xavierleroy/coq2html/blob/d68b757b5bcc51a17e57c8301e862640db521e7f/coq2html.mll#L274 is ASCII-only. I need to look at how Coq itself handles Unicode in identifiers.
I don't recall how ocamllex does with unicode. I had a very good experience with Alain Frish's sedlex https://github.com/alainfrisch/sedlex
ocamllex doesn't know anything about Unicode but it might be able to do something with the UTF-8 encoding of Unicode. I, too, was thinking of ulex and its successor sedlex.
coq2html gets confused when unicode characters are used in identifiers. Here is a small example
The
o
-prefixed identifiers are handled properly but not theω
-prefixed ones.