lexborisov / myhtml

Fast C/C++ HTML 5 Parser. Using threads.
GNU Lesser General Public License v2.1
1.66k stars 147 forks source link

Encoding, russian characters. #153

Closed Fandanguero closed 6 years ago

Fandanguero commented 6 years ago

I took "get_title_high_level" example project and used it to parse saved "habr.com/all" main page, but the output I get on my console, is completely unreadable. I use VS2017 if it makes any difference.

Can I have these few lines of code (or that little example project rewritten) that will handle non-ascii characters properly? :) Thanks.

lexborisov commented 6 years ago

@Fandanguero hi!

Unfortunately, I do not know Windows. Does the console support the utf-8 byte stream? If you need codepoints then you can convert byte stream to codepoints. For example, see myencoding_ascii_utf_8_to_codepoint

In linux and mac os x:

./examples/myhtml/get_title_high_level /new/test.html 
Title: Все публикации подряд / Хабр

P.S.: Earlier I tested in Windows 7 and everything was displayed correctly.

Fandanguero commented 6 years ago

Woohoo! Thanks for pointing me in right direction. I had to switch code page for console to UTF-8 and then change the font to the one of those which support it. So much easier than I thought :D