token length incorrect using --latin1 option

haskell / alex

A lexical analyser generator for Haskell

https://hackage.haskell.org/package/alex

BSD 3-Clause "New" or "Revised" License

297 stars 82 forks source link

Closed chrismshelton closed 9 years ago

chrismshelton commented 9 years ago

The template files assume UTF-8 in the alex_scan_tkn function, even when the --latin1 option is given.

alex_scan_tkn user orig_input (if c < 0x80 || c >= 0xC0 then PLUS(len,ILIT(1)) else len)

This leads to an incorrect length being given to the lexer actions for tokens containing bytes between 0x80 and 0xC0

hvr commented 9 years ago

do you happen to have a small test-case for this?

chrismshelton commented 9 years ago

Sorry, my bad, now that I'm trying to come up with a test case I can't reproduce the error