Closed bfreuden closed 3 years ago
Yeah, it seems that I get the correct number, but:
c = (char) value;
breaks it.
btw, I will release the fixed version soon, if you don't have any more issues 👍
Thank you VERY MUCH for finding and reporting these nasty bugs!s
Wow that was fast! Thank you so much for those lightning-fast fixes! This is all I have so far in terms of bugs :-). If you release a new version I will definitely give it a try on common-crawl data :+1:
I do have a remark concerning the use case of knowing the position of texts in the input (something similar to tagPosition and tagLength) though. If you're ok I might open a new ticket to share that with you.
@bfreuden Sure I will release the fixes this week(end) :)
I am open to all ideas, please do share them!
The following program:
Produces the following output:
It might be a matter of character references that are surrogate pairs. Maybe a "codepoint & 0xFFFF" somewhere in the code?