haskell-suite / haskell-src-exts

Manipulating Haskell source: abstract syntax, lexer, parser, and pretty-printer
Other
193 stars 94 forks source link

Unable to parse legal UTF-8 function names #425

Open fosskers opened 5 years ago

fosskers commented 5 years ago

I decided to venture into no-man's land and define a type whose fields had non-ASCII names:

data Entry = Entry { 漢字 :: Kanji
                   , 部首 :: Kanji
                   , 親 :: Kanji }

This compiles fine, but unfortunately haskell-src-exts (and therefore stylish-haskell) is unable to process it:

ParseFailed (SrcLoc "<unknown>.hs" 31 22) "Illegal character ''\\28450''\n"

which is produced by this guard branch in InternalLexer.hs:

https://github.com/haskell-suite/haskell-src-exts/blob/e0f2aa86d68b993f0a81633af467dc550b3e7270/src/Language/Haskell/Exts/InternalLexer.hs#L808-L830

Yes I'm evil for using UTF-8 field names, but we should still probably be able to parse these anyway, since they're legal as far as GHC is concerned.

Thoughts? Thanks for your on-going efforts.

mpickering commented 5 years ago

I'll merge a patch which fixes it but these days I'm not going to spend any more time myself fixing problems with the parser.

fosskers commented 5 years ago

Understood, thank you.

afcady commented 5 years ago

EDIT: got it wrong the first time.


Isn't the actual problem that it's falling through to that otherwise on L808 in the first place? It should match on a previous case on L778:

https://github.com/haskell-suite/haskell-src-exts/blob/e0f2aa86d68b993f0a81633af467dc550b3e7270/src/Language/Haskell/Exts/InternalLexer.hs#L808

https://github.com/haskell-suite/haskell-src-exts/blob/e0f2aa86d68b993f0a81633af467dc550b3e7270/src/Language/Haskell/Exts/InternalLexer.hs#L778-L782

Also in need of fixing here:

https://github.com/haskell-suite/haskell-src-exts/blob/e0f2aa86d68b993f0a81633af467dc550b3e7270/src/Language/Haskell/Exts/InternalLexer.hs#L362