marcelog / ex_abnf

Parser for ABNF Grammars
Apache License 2.0
61 stars 12 forks source link

UTF-8 handling? #13

Open mikebaldry opened 7 years ago

mikebaldry commented 7 years ago

When I try to pass a UTF-8 charlist, characters such as ł which equate to <<197, 130>> actually go in as 322 in the charlist.

iex(1)> 'hełło'             
[104, 101, 322, 322, 111]

This causes things to break (sometimes I see :erlang.iolist_size([322]) which fails because its > 255, for example), sometimes it just fails to match (depending on the current parsing context I guess)

Am I doing something wrong? (I'm assuming I am!)

I've currently got around this very very crudely by stepping through the bytes and turning it in to a normal list (so I get [197, 130] instead of [322]) then when the result comes back from apply, turn anything in the state that is a string back by stepping through the list and adding to a <<>>.

Great work on this BTW!