markedjs / marked

A markdown parser and compiler. Built for speed.
https://marked.js.org
Other
32.3k stars 3.36k forks source link

Allow <b> and <i> tags #434

Closed NetoBuenrostro closed 6 years ago

NetoBuenrostro commented 10 years ago

How about allowing <b> and <i> tags in markdown.

This can use the same notation defined in markdown, just outputting different tags.

**This is Strong** ---> <strong>This is Strong</strong>
*This has emphasis* ---> <em>This has emphasis</em>

and

__This is bold__ ---> <b>This is bold</b>
_This is italic_ ---> <i>This is italic</i>

Not everything has be semantic, some stuff just have to be highlighted.

roydukkey commented 10 years ago

Why would this configuration be preferred?

notslang commented 9 years ago

The <b> and <i> tags denote styles, rather than how something should be understood. Markdown (and of course HTML) are not about styles. That's what CSS is for. Markdown/HTML describe ideas that can be represented in many different ways, including:

You could argue that <b> and <i> should be used to denote places where superfluous styles are meant to be added (as the result of a styleguide or formatting constraint), as this article does. However, if that is the case then <b> and <i> shouldn't be so syntactically similar to <strong> and <em> since they convey very different concepts. Also, I would argue that they shouldn't even be used frequently enough to be given a place in the Markdown syntax.

Anyway, :-1: from me.

Feder1co5oave commented 6 years ago

Agree with @slang800 up here. This is what CSS is for. @joshbruce

joshbruce commented 6 years ago

Disagree actually... @slang800 is correct in the assessment; however, I don't agree with the conclusion.

em and strong are about semantics. What words get stressed in the language.

b and i are about font-face or family changes.

HTML is different than written text in the sense that with the traditional written word you only have bold and italic - any change in inflection is inferred by the context. This was actually a big debate back in the beginning days of the semantic web conversation. A screen-reader for the blind can actually differentiate and change inflection based on the four different tags. So, something marked as italic may not receive a different inflection - but, something marked as emphasis would.

One of the PHP Markdown parsers actually has a setting to accomplish this:

  1. By default underscores and asterisks will both generate emphasis and strong tags.
  2. However, you can set a flag that underscores will do bold and italic while asterisks will do emphasis and strong.

From a style perspective - how the browser renders these elements by default they are the same because they are the same in the written word. However, from a CSS perspective, I can make b and i look aesthetically different than strong and em...having said that, I can not alter their declared intent based on semantics.

Leaving open for further discussion and putting with #1036

joshbruce commented 6 years ago

See also headings from MLA (https://owl.english.purdue.edu/owl/resource/747/24/). We don't want to "stress" the sub-headings - but we do want to visually distinguish them to signal a reader that they are in a different section of the document.

joshbruce commented 6 years ago

ps. For non-HTML display - the Markdown remains unaffected as the base spec will cause bold and italic regardless of underscore or asterisk.

Feder1co5oave commented 6 years ago

I'm not really getting where we're going with this... I like thinking that markdown is about delivering semantic information about the text, hence the outputting of <em> and <strong>, which can be stylized with CSS rules. Whereas <b> and <i> have explicit formatting information (that I would feel a fool changing via CSS) and have no place in markdown documents. However, I feel that you'd enjoy the possibility of having 4 different types of emphasis/formatting/whatever and I'm not gonna argue against that.

However, you can set a flag that underscores will do bold and italic while asterisks will do emphasis and strong.

To do this, we would need a way to differentiate between the two. Somewhat similar to what was proposed in #1031.

joshbruce commented 6 years ago

Agreed with the reference to #1031 to a point. However, that solution seems to allow one to specify or limit the rule (Markdown allows all three by default). Not sure I'm seeing what you're seeing on that score.

Was thinking a flag might suffice: underscoreBoldAndItalic: false (or something - setting true gives us this ability - and I don't think there's a spec that differentiates; so, we're able to do whatever really).

Mainly because Markdown has us covered on this moreso than with hr and whatnot. There are two (four) different elements and Markdown provides a simple opening to that (two ways to do two of them). When I wrote a parser ~2005 I gave myself the ability to do emphasis and strong using asterisks, which is the one that has caught on in a lot of places (including Slack, for example). Meanwhile, the less used underscores allowed me to do b and i for works-cited sections of documents where, semantically, emphasis and strong don't make sense.

More examples:

A works cited page where we reference a book title like Romeo and Juliet <- should not use the emphasis element.

https://www.w3.org/TR/html51/textlevel-semantics.html#the-em-element

https://www.w3.org/TR/html51/textlevel-semantics.html#the-i-element

To the styling point, I was just using that as an example. CSS can style all four elements however we want it to. The point was that it doesn't change the semantic correctness of one or the other - the visual appearance doesn't matter to the semantic intent.

notslang commented 6 years ago

For citing works, you should use a <cite> tag and then style that however you want.

joshbruce commented 6 years ago

@slang800: From an HTML perspective, I concur. From a writing a document perspective though "cite" is not an option.

notslang commented 6 years ago

It works for me?

$ echo "this <cite>is</cite> *markdown*" | marked 
<p>this <cite>is</cite> <em>markdown</em></p>

I'll admit that having to put a <cite> tag isn't quite as elegant as having dedicated syntax for citations. However, I think it's far more semantically meaningful than using tags which only describe the style of the text.

See also: https://babelmark.github.io/?text=this+%3Ccite%3Eis%3C%2Fcite%3E+*markdown*

joshbruce commented 6 years ago

@slang800: Not sure we're speaking the same language on what we're both driving at here. :)

One of the things about Markdown is that it is plain text and human readable without conversion to HTML...none. When we use something like Pages for macOS or Word for Windows and macOS - "cite" is not an option for formatting text. See the text formatting options from Pages, for example.

screen shot 2018-01-26 at 6 32 46 pm

So, yes, from an HTML perspective, when citing source material one would probably want to use the cite element. That was just one example of the larger concept of there being a difference between italicizing something and emphasizing something. Sometimes, you want the visual appearance of the italic face, while not inferring any inflection change in the text.

Again, going back to the header from APA. It is not a block level element - it's a sentence (inline). Further, it is italicized but not emphasized. In HTML...because documents don't differentiate between changes in inflection and changes in font face.

In a document created by word or something like it - it's just an italicized sentence. So, in Markdown:

*Some APA header.*

Is correct visually, because most writing tools will display it as italicized; however, the use of the emphasis element is semantically incorrect because we're not trying to mark a change in the inflection of the word...unlike the two times I used it here. :)

Hope that helps.

notslang commented 6 years ago

I don't think that Apple's Pages or MS Word is an ideal to aspire to. They don't seem to have any boundary between content and style. They store display settings like page margins and size along with the actual content of the document in a horrible binary blob. Also, I think that MS Word does have citation support now. I haven't used it in years, so I don't know how it works.

Sometimes, you want the visual appearance of the italic face, while not inferring any inflection change in the text.

Is that done frequently enough to justify being a part of Markdown syntax and adding another (potentially confusing) rule to how Markdown should be interpreted?

Also, I don't see any issue with using <cite> in the cases where you actually need to cite something. So long as you're not using GitHub flavored markdown, you can include HTML in places where you really need it.

In a document created by word or something like it - it's just an italicized sentence

In Word there's no distinction between semantic emphasis and the kind of italicization you use when you're actually trying to say that something is a citation but you don't have the syntax in Markdown to do it. Word doesn't care about semantics in the same way that Markdown does.

joshbruce commented 6 years ago

Again, I'm not sure we're speaking the same language at the same levels of consideration...and I'm not sure we can get there via text.

I'm not saying we should "aspire" to anything. I'm trying to differentiate between a markup language (HTML) interpretted by a computer for communication to a reader and the craft of typesetting (the look of a visual page). HTML, when interpretted by a screen reader for the blind or illiterate can distinguish between emphasis and non. A document, the visual page, can't - at least not using traditional typesetting and typefaces.

Yes it is done frequently enough to be under consideration, in my experience (that's why, back in the day, there was some pretty heated debate because, if memory serves, the w3c was considering deprecating b and i and only having strong and em...there was also pretty heated debate over abbr and acronym - acronym did get deprecated despite there being nuance between the two, linguistically). In printed works, italics are sometimes used to convey someone thinking to themselves. Again, works cited pages and headings and other concepts inside various style guides. Same with bold. They are a stylistic choice made by the typesetter rather than a communication of linguistic intent by the author.

Not asking for a change to the Markdown syntax - we are not a specifying body (if we stick strictly to the specs - we can just close this ticket and not consider it again). I do think it's an interesting problem space though, which is why I left it open for now - probably gonna be closing it soon though.

The interesting thing about this solution is that even if we pass the Markdown through a parser that doesn't differentiate, the result still renders as expected:

_hello_

*hello*

According to the base Markdown spec (Markdown.pl), both of these are valid Markdown.

If we pass that Markdown through a parser that differentiates:

<i>hello</i>

<em>hello</em>

Visually the same by default in a browser - semantically, however, they are different.

If we pass that Markdown through a parser that doesn't differentiate (complies with Markdown.pl):

<em>hello</em>

<em>hello</em>

Visually the same by default in a browser - semantically they are also the same.

I agree - you can use HTML within Markdown and should be able to - having said that, HTML is not the only way Markdown is used these days; so, if we can decouple or differentiate between the world of typesetting and the world of the web - it might be beneficial (this is why the video conversation #675 is what it is). (Of course, the argument could be made that Marked doesn't do non-web parsing...so, yeah...why have the discussion?)

Word doesn't care about semantics in the same way that Markdown does.

I would say, if a user (typsetter) can't differentiate between inflection (em) and changes in fontface (i), then Markdown doesn't actually care about semantics either - at least in that regard; gonna do the dictionary thing:

Semantic: relating to meaning in language or logic.

Italic: of the sloping kind of typeface used especially for emphasis or distinction and in foreign words.

(Italics/emphasis added - pun intended.)

Emphasis (2): stress given to a word or words when speaking to indicate particular importance.

They are also different according to the HTML5 specification linked to in previous comments.

So, no matter what angle we look at it from, in my opinion and practices, Markdown is deficient in its specification because it doesn't differentiate for the mediums through which it is used between the intent of the author versus the practice of typesetting. However, its original specification offers a simple way to overcome that deficiency, if one chooses to do so.

Having said all that, at the end of the day, the specifications will win for the Marked library - pretty much every single time. When we start looking at making Marked easier to extend - someone should be able to add this functionality easily. Of course, it would be interesting if more people started using that library instead of Marked directly. :)

Strong and emphasis are something of a sore spot for other reasons as well though - see #1036.

joshbruce commented 6 years ago

Not sure if @NetoBuenrostro is being notified about this conversation being "revived" for now - want to make sure.

joshbruce commented 6 years ago

This conversation is helping me at least crystalize some higher-level thoughts, so thanks.

joshbruce commented 6 years ago

Closing but flagging for possible consideration in releases 1.x or 2.x

dwhieb commented 2 years ago

Adding a use case to this:

In linguistics it's very common to indicate the relevant part of an example using bolding, like so (the final "b" is bolded).

The letter "b" in the word thumb is silent.

This is the perfect use case for the <b> element, which the HTML spec now describes as the Bring Attention To element. So here it's being used not for styling, but for the semantic function of bringing attention to something. Someone could style that however they'd like—bold, italic, red color, underline, etc.

In this situation, the <strong> element is not semantically accurate:

The <strong> HTML element indicates that its contents have strong importance, seriousness, or urgency. (MDN)

Similarly, in linguistics italics is used as a technical device to show that the word is being mentioned and not actually used (see the use-mention distinction):

The word house has five letters.

This perfectly fits the description for the <i> Idiomatic Text element:

The <i> HTML element represents a range of text that is set off from the normal text for some reason, such as idiomatic text, technical terms, taxonomical designations, among others.

The <em> element would not be appropriate for this, since nothing's being emphasized. The italics is instead a technical convention to distinguish use vs. mention of a word.

Modern HTML gives <b> and <i> proper semantic interpretations now, which are different from <strong> and <em>. So I'd find it super helpful to have this issue reopened and implemented.

UziTech commented 2 years ago

@dwhieb You have a few options if you want to output <b> and <i> tags:

iksent commented 4 months ago

Imagine situation: you were moved to an old or legacy project, where <b> is using instead of strong for many years and it can't be changed due to a lot of business problems.

And you just can't use strong, you need to use b anyway. And you need to add a markdown support (and markedjs, as a result).

I researched the markedjs' code for strong tag, as It already covered a lot of cases, but it doesn't look like a simple extension, you just can't copy and paste it, because of different params:

image

And this is the extension tokenizer types: (this: TokenizerThis, src: string, tokens: Token[] | TokensList) => Tokens.Generic | undefined;

So what is the best and simple variant to add support for b tag now with a `` tokens?**

calculuschild commented 4 months ago

@iksent The tokenizer you have in the screenshot is a bit complicated, but that's just for recognizing the Markdown code and turning it into a token. The actual HTML output is instead determined by the renderer. You can easily override the renderer functions for Em and Strong. Look at this part of our docs:

https://marked.js.org/using_pro#renderer

The Em and Strong renderers are very simple:

  strong(text: string): string {
    return `<strong>${text}</strong>`;
  }

  em(text: string): string {
    return `<em>${text}</em>`;
  }

So overriding would be as simple as defining new renderer functions (something) like this:

const renderer = {
  em(text) {
    return `<i>${text}</i>`;
  },
  strong(text) {
    return `<b>${text}</b>`;
  }
};

marked.use({ renderer });