houseabsolute / Markdent

An event-based Markdown parser toolkit
http://metacpan.org/release/Markdent/
Other
12 stars 13 forks source link

Test for unicode #8

Closed imago-storm closed 9 years ago

imago-storm commented 9 years ago

Hello, some time ago I've found an issue with wide characters in Markdent::Parser::BlockParser.

If we have UTF-8 encoded string without UTF-8 flag, it seems to be ok, at least in Markdent itself, but HTML::Stream converts it into something ugly. If we have UTF-8 encoded string with UTF-8 flag, we just cannot process document with HTML tags - sha1_hex fails to process wide chars.

The test here shows the case.

Are the strings with UTF-8 flag allowed here? Sorry, but I have not found anything in documentation regarding this case.

autarch commented 9 years ago

I have a branch that fixes the various Unicode bugs - https://github.com/autarch/Markdent/tree/autarch/fix-unicode-html - I plan to merge this soon and do a release if it passes Travis.