xoofx / markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET
BSD 2-Clause "Simplified" License
4.21k stars 444 forks source link

What is the right way to go about translating international characters using Markdig? #86

Open havremunken opened 7 years ago

havremunken commented 7 years ago

Hi,

Completely new to Markdig, so please forgive my ignorance.

I installed using Nuget and used the Advanced extensions in the pipeline, then hooked Markdig up to a HTML browser control in WPF. It mostly works exactly as expected, but when it comes to international characters, it falls down. That is, the WebBrowser control displays gibberish as Markdig doesn't touch these characters.

In my case this relates to the extra norwegian characters and their html equivalents:

æ - æ
ø - ø
å - å
Æ - Æ
Ø - Ø
Å - Å

And of course there are many, many other characters with similar translations.

I realize this may not be something that needs to be a part of Markdig as such, but I failed to find any information that told me what was the right way to take care of it in my own project. Write an extension?

Is there a tutorial for how to do something like this?

It is of course easy enough to search and replace the output of Markdig to do this, but it would be nice to not have to.

Thanks for a very cool project!

Kryptos-FR commented 7 years ago

I would suggest to use the different utilities provided by System.Net.WebUtility. The methods you are interested in are HtmlEncode(String) and HtmlDecode(String).

That said, where to put it exactly in the markdig process is another question. I assume doing it in the literal renderers would be the best place, to not mess with HTML tags that might get encoded otherwise.

xoofx commented 7 years ago

Hm, I'm not sure where is the problem, Markdig will output characters to a StreamWriter utf8 by default, so all characters should be supported by HTML and browsers, no need to use HTML escapes for the characters you mentioned above... Are you sure that your HTML head contains the proper utf8 encoding? Unless the webbrowser (using maybe an old IE9) is not supporting this... verify that you have at least something like <meta charset="UTF-8">... It is unlikely that we need to modify anything in Markdig (unless a bug)

havremunken commented 7 years ago

Thanks for suggestions, both.

This is definitely a browser issue, for sure, I don't know exactly which IE that WPF uses with the WebBrowser control, but it feels like IE6 some times.

I will try to wrap my output in proper HTML as you mention, I just wanted to consider the "fallback" option of having these special chars "rendered" as HTML entities for maximum compatability. I'll look into what @Kryptos-FR suggested and see if I can get the order right as well.

Thanks for your input!

xoofx commented 7 years ago

sorry, I'm quite busy this week... won't be able to look at this before next week

deakjahn commented 6 years ago

That's my question, too. How can I specify the meta tag to be used by Markdig in the resulting HTML (short of recompiling, of course)?

xoofx commented 6 years ago

That's my question, too. How can I specify the meta tag to be used by Markdig in the resulting HTML (short of recompiling, of course)?

Markdig doesn't output a full HTML document but only a HTML fragment, so it is up to your integration to handle this.

deakjahn commented 6 years ago

Indeed. I never looked into it, just passed it to the WebBrowser, no questions asked. Should have looked into it, sorry. Perfect now, thanks.