xoofx / markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET
BSD 2-Clause "Simplified" License
4.21k stars 444 forks source link

Fails to substitute (**) with <strong> tags for Japanese and Chinese languages. #765

Closed Artem-Beziazychnyi closed 5 months ago

Artem-Beziazychnyi commented 5 months ago

Method Markdown.ToHtml doesn't convert ** to <strong> when it appears before or after Japanese or Chinese symbols. For other languages, it works as expected.

Example: Text that I need to convert to html **如果您需要重新安排教练时间,**请确保在原定教练时间前至少提前 **48 小时**重新预约教练环节,以便您的教练知悉情况并为其他学员安排时间以填补空档。

The result from Markdig: **如果您需要重新安排教练时间,**请确保在原定教练时间前至少提前 <strong>48 小时</strong>重新预约教练环节,以便您的教练知悉情况并为其他学员安排时间以填补空档。

The result from Legacy Markdown package: <strong>如果您需要重新安排教练时间,</strong>请确保在原定教练时间前至少提前 <strong>48 小时</strong>重新预约教练环节,以便您的教练知悉情况并为其他学员安排时间以填补空档。

Could you please provide me with instructions on how to fix this issue correctly?

xoofx commented 5 months ago

From the specs emphasis-and-strong-emphasis section:

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

So for this to be correctly formatted, you need to change it to:

**如果您需要重新安排教练时间**,请确保在原定教练时间前至少提前 **48小时**重新预约教练环节,以便您的教练知悉情况并为其他学员安排时间以填补空档

如果您需要重新安排教练时间,请确保在原定教练时间前至少提前 48小时重新预约教练环节,以便您的教练知悉情况并为其他学员安排时间以填补空档

as shown on babelmark here

It is not possible to have **如果您需要重新安排教练时间,** working for example, because the character seems to be a space before the closing **, hence why it can't detect it.

Artem-Beziazychnyi commented 5 months ago

It seems that I need to work on improving those templates. Thank you for bringing this to my attention!

Artem-Beziazychnyi commented 5 months ago

In Japanese and Chinese languages ", " and ". " are single char :( https://www.compart.com/en/unicode/U+FF0C So, I cannot remove space there.