mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
759 stars 140 forks source link

Feature request: support for underlines #101

Closed Waqar144 closed 4 years ago

Waqar144 commented 4 years ago

Hello,

Thank you for this great library. We are switching to MD4C already in QOwnNotes

We were using hoedown before and it had optional support for underlines i.e., The following text would be rendered as underline

_Underlined text_

Perhaps you can add this feature to MD4C. Maybe by using a flag such as MD_EMPHASIC_CHAR_UNDERSCORE and then using that to decide whether to render the text as underlined or italic.

mity commented 4 years ago

If we do that, we should likely be more or less compatible when the flag is used.

So it would be good to know how this differs from the current parsing and whether it is "just" about interpretation (i.e. to see _ as underline instead of (strong) emphasis) or whether it has also to be parsed differently.

Could you please check how hoedown behaves in these cases?

  1. Two (or more) underscores: __foo__
  2. Non-matching count of underscores: __foo_
  3. Intra-word underscores: foo_bar_baz
Waqar144 commented 4 years ago

Could you please check how hoedown behaves in these cases?

1. Two (or more) underscores: `__foo__`

It renders this as foo [BOLD]

2. Non-matching count of underscores: `__foo_`

This gets rendered as _foo(underlined)

3. Intra-word underscores: `foo_bar_baz`

This gets rendered as foo|bar(underlined)|baz.

Picture:

image


This isn't supported by CommonMark. In my opinion, underscore should just be interpreted the same way as it is already is being interpreted(as specified by the commonmark spec). Only when converting to html, the underscore is (optionally) rendered as underline.

CC: @pbek

mity commented 4 years ago

Thank you. I will take a look at it.

mity commented 4 years ago

underscore should just be interpreted the same way as it is already is being interpreted(as specified by the commonmark spec). Only when converting to html, the underscore is (optionally) rendered as underline.

Hmm. I'm wondering whether it is a good idea, given how and why CommonMark does the emphasis.

For example consider three underscores:

___foo___

It now renders (as requested by CommonMark) into:

<p><em><strong>foo</strong></em></p>

That makes sense, given the <em> and <strong> distinction.

But if we render

<p><u><u>foo</u></u></p>

instead, it imho gets just confusing. We should either render

<p><u><u><u>foo</u></u></u></p>

or maybe

<p><u>__foo__</u></p>

Or, maybe we can do it in the same way as MD_FLAG_STRIKETHROUGH (see strikethrough.txt and #102). We could then even reuse most of the code (plus disabling the standard emphasis, of course, for the underscore.)

(That would also allow intra-word underline, which may be good or bad thing, depending whether people use underscore as a normal character even outside a code spans/blocks. I have no idea about that.)

Waqar144 commented 4 years ago

Or, maybe we can do it in the same way as MD_FLAG_STRIKETHROUGH (see strikethrough.txt and #102). We could then even reuse most of the code (plus disabling the standard emphasis, of course, for the underscore.)

This sounds the best to me. Reapplying/reusing the logic is best imo.

(That would also allow intra-word underline, which may be good or bad thing, depending whether people use underscore as a normal character even outside a code spans/blocks. I have no idea about that.)

I haven't really ever seen any intra word underscores in my life outside of the coding world. So this makes sense too.