phorward / unicc

LALR parser generator targetting C, C++, Python, JavaScript, JSON and XML
MIT License
59 stars 9 forks source link

Fix UTF-8 support for customized `getchar()` #32

Open phorward opened 10 months ago

phorward commented 10 months ago

UniCC currently does not accept UTF-8 in its own input files! The stuff in README.md is a lie...this is awkward and crazy, and I wasn't rightly aware this is such a big problem.

This pull request adds full UTF-8 support to the C target, and fixes UniCCs grammar parser to accept and process UTF-8 correctly.