Closed henry0312 closed 1 year ago
In this case, "naturally" varies from person to person.
pandoc (that is already widely used) east_asian_line_breaks
handles this case as same as current goldmark.
It is more preferable for me this is an option for users.
Thank you for your feedback.
I completely understand your viewpoint. The perception of what appears "natural" can indeed vary among different users.
Regarding an appropriate option name:
CustomEastAsianLineBreaks
? This name signifies that it's tailored for specific East Asian line break behaviors and distinguishes it from the standard east_asian_line_breaks
option provided by tools like pandoc. I'm open to other suggestions as well.Regarding the changes in this pull request:
With that in mind, I fully support making this feature an optional behavior for users. I can proceed with implementing this as an option if everyone agrees.
Looking forward to further feedback and suggestions.
It might be better to say that options for EastAsianLineBreaks
to be functional options for WithEastAsianLineBreaks
.
For example:
extension.NewCJK(
extension.WithEastAsianLineBreaks(
extension.WithXXX(),
),
)
I've followed your suggestion and added a WorksEvenWithOneSide
sub-option to the EastAsianLineBreak
option.
I would greatly appreciate it if you could review the changes in this PR.
Thank you in advance for your time and feedback.
Thanks for updating PR. I feel your implementation focuses too much on a specific issue.
Segmentation break problems have been discussed for a long, especially in CSS specification. 'Natural' segmentation break is really hard, depends on many languages.
In the past, CSS specifications draft defined segmentation breaks as follows.
- If the character immediately before or immediately after the segment break is the zero-width space character (U+200B), then the break is removed, leaving behind the zero-width space.
- Otherwise, if the East Asian Width property [UAX11] of both the character before and after the segment break is F, W, or H (not A), and neither side is Hangul, then the segment break is removed.
- Otherwise, the segment break is converted to a space (U+0020).
I think option names like CSS word break property values are more flexible than concrete option names like WithWorksEvenWithOneSide
that you implemented. If Korean developer clams to change CJK extension behavior, they will make PR by following your implementation. It can end up giving the CJK extension a lot of detailed and confused options.
Like the following options, 'style' aggregates detailed options into one option value:
type EastAsianLineBreakStyle int
const (
EastAsianLineBreakStyleSimple = iota // Pandoc style line break
EastAsianLineBreakCSS3Draft //
// etc...
)
And WithEastAsianLineBreaks
takes a this option:
WithEastAsianLineBreaks(EastAsianLineBreakCSS3Draft)
Note that for backward compatibilities, WithEastAsianLineBreaks
should be WithEastAsianLineBreaks(...EastAsianLineBreakStyle)
not WithEastAsianLineBreaks(EastAsianLineBreakStyle)
.
Thank you for your detailed feedback. First and foremost, I apologize for the delay in responding; reviewing and researching the CSS draft took a considerable amount of time.
I've closely aligned the line break specifications with your intentions, which is why we opted for the current implementation. We wanted a solution that was both intuitive for users and consistent with the desired functionality.
Your reference to the ongoing discussions surrounding the CSS specifications has been invaluable. I have delved into the extensive debates and found them to be quite intricate. Honestly, fully comprehending the nuances of the CSS line break specifications has been a challenge. Given its complexity, achieving a perfect implementation of the CSS line break specifications would indeed be tough.
However, in keeping with your intentions and after considering your feedback, I've aimed for an approach that enhances code readability and extensibility. This ensures that future modifications or additions can be seamlessly integrated. Your suggestion of using EastAsianLineBreakStyle
definitely aligns with this goal, offering a more streamlined and organized solution.
Regarding backward compatibility, I've ensured that WithEastAsianLineBreaks
remains variadic with ...EastAsianLineBreakStyle
to not disrupt the experience for current users.
I genuinely value your insights and constructive feedback. I'll continue refining the implementation to make it as adaptable and efficient as possible. Please let me know if there are other concerns or recommendations you'd like to share.
I've finally completed the implementation of CSS3Draft
!
Please review it.
P.S. Several lint errors are occurring in Github Actions. These errors seem unrelated to the PR. https://github.com/yuin/goldmark/pull/411/files#diff-2fc06eaa27f973f12f6d355127a8e1e8d0c49e88dca4c6e9fc843aebea12d0a5
I've merged the PR and made some improvements codes in https://github.com/yuin/goldmark/commit/9c9003363f693a5c06f6867e617697afb1447405.
Thanks for your contribution.
Thank you for the merge. I also appreciate the extensive reviews and explanations of the direction.
In this pull request, we've addressed the rendering of line breaks, especially concerning the interaction between Western and East Asian wide characters. The aim is to ensure that the output is more intuitive and naturally rendered.
Changes:
Refactoring in
renderText
Method:
- Simplified the condition that determines whether a new line should be added between characters. We've used a more straightforward and readable condition using De Morgan's law.
- This change primarily impacts the way line breaks are handled between East Asian wide characters and Western characters.
Updated Test Cases:
- We've modified the expected results for the test cases in
cjk_test.go
to reflect the desired behavior.- Specifically, we've adjusted tests that involve soft line breaks between a Western character and an East Asian wide character. This ensures that our tests are in line with the new rendering logic.
Through these changes, we hope to enhance the readability of rendered content, especially when dealing with mixed character types. We kindly ask for your review and feedback on these modifications.
Example 1:
Input Markdown:
私はプログラマーです。 GoでWebアプリケーションを開発しています。
Current Output:
<p>私はプログラマーです。\nGoでWebアプリケーションを開発しています。</p>
Expected Output:
<p>私はプログラマーです。GoでWebアプリケーションを開発しています。</p>
Example 2:
Input Markdown:
I am a programmer. <!-- notice: there is a white space after the last period --> 私はプログラマーです。
Current Output:
<p>I am a programmer. \n私はプログラマーです。</p>
Expected Output:
<p>I am a programmer. 私はプログラマーです。</p>
In this pull request, we've addressed the rendering of line breaks, especially concerning the interaction between Western and East Asian wide characters. The aim is to ensure that the output is more intuitive and naturally rendered.
Changes:
Refactoring in
renderText
Method:Updated Test Cases:
cjk_test.go
to reflect the desired behavior.Through these changes, we hope to enhance the readability of rendered content, especially when dealing with mixed character types. We kindly ask for your review and feedback on these modifications.
Example 1:
Input Markdown:
Current Output:
Expected Output:
Example 2:
Input Markdown:
Current Output:
Expected Output: