turquoiseowl / i18n

Smart internationalization for ASP.NET
Other
556 stars 156 forks source link

Inconsistent HTML-encoding behavior #257

Closed ridderholt closed 8 years ago

ridderholt commented 8 years ago

We are running the latest version (2.1.6.0) with MVC 5 and .NET 4.5 and we are experiencing some inconsistent HTML-encoding behavior.

Our site is translated to Swedish and German with English as the fallback language.

We have piece of code in a razor view that looks like this:

var commentString = Model.NumComments == 1 ?
                    "[[[1<span class="comment-count">comment</span>]]]" :
                    string.Format("[[[%0<span class="comment-count">comments</span>|||{0}]]]", Model.NumComments);

...

<div class="comment-container">
  @commentString
</div>
...

When the i18n framework is translating these strings to either Swedish or German this works fine and the HTML is encoded and interpreted as HTML by the browser. But when we are displaying the site in English (fallback language) the HTML is not properly encoded and the browser interprets this string as any other text and displays the raw HTML.

The same HTML-encoding issue occurs when having HTML strings as parameters to a translatable string:


<div>@string.Format("[[[%0 some text %1|||{0}|||{1}]]]", "<span class=\"my-class\">", "</span>")</div>

For what it's worth we do know that it may not be a best practice to have HTML in Nuggets and these examples are pretty easily fixed, but we would like for the HTML-encoding to be consistent in one way or the other.

turquoiseowl commented 8 years ago

Does this help?:

<div class="comment-container">
  @Html.Raw(commentString)
</div>

Refer to #202 (end of) for more details.

I'm not sure there's a better fix for this, but am open to ideas.

turquoiseowl commented 8 years ago

Just to confirm what is happening here: The msgid is stored in the source file unencoded and so is also represented in PO files unencoded. Razor, on the other hand, is automatically Html-encoding the string (< to <) which means that i18n receives it for post-processing in encoded form.

This initially caused i18n to miss the nugget, but a workaround was introduced (#202) to look up the HTML-decoded msgid (< back to <) if/when the initial (raw) lookup fails.

Where a translation exists this will lead to the substitution of the msgid with exactly what is got from the translation, i.e. in unencoded form; so the output is as expected.

Where a translation does not exist, i18n falls back on outputting the msgid as is, hence the literal output of all the markup.

The reason why Html.Raw helps here is that: a) the second lookup workaround is not required; and b) when the lookup fails due to no translation, the entity correctly contains unencoded markup.

ridderholt commented 8 years ago

Thanks for the information, I was unaware that Razor actually automatically did the HTML-encoding of the string and thought this was done in your libary and therefore I believed that the encoding was inconsistent. And you are correct that

<div class="comment-container">
  @Html.Raw(commentString)
</div>

solved our problem. Thanks for your time, you can close this issue.