Closed banacorn closed 9 years ago
Note that '\57304'
is the second element of the surrogate pair of '𝟘'
which suggests that this bug is caused by advancing by 16 bits irrespective of the width of any particular character. I can reproduce this bug (and similar bugs in scan
, peekChar
, takeText
, and takeLazyText
) using any character which requires 32 bits to represent (i.e. ord c >= 2^16
).
...this bug is caused by advancing by 16 bits irrespective of the width of any particular character.
It looks like you're correct.
Thanks for the helpful repro. I'll take a look at this as soon as I can.
Any chance of a release with this fix? I just ran into this with an even simpler reproduction: takeText "💋"
.
ha, is that a pair of lips?
Released as 0.12.1.3.
Thanks!
The code should result in
Done "a" "\120792"
, a clean cut. But I getDone "\57304a" "\120792"
With the predicate negated,
takeWhile
also presents the same issue.The issue can be reproduced with this gist I'm using
attoparsec-0.12.1.2
withtext-1.2.0.0
Thanks!