jgm / djot

A light markup language
https://djot.net
MIT License
1.72k stars 43 forks source link

em, i, cite #13

Open snan opened 2 years ago

snan commented 2 years ago

A lot of the time when we use italics it's for emphasis text (<em>), other times it's book title (<cite>) or some weirdo other language quote or Linneaen flower name (in which case we have to use <i>). The commonmark way to do that is to use raw HTML, but that's more cumbersome in djot, and raw HTML isn't something we wanna leave on for world-readable forums and wikis anyway.

That's why I suggest that djot produces <b> and <i> instead of <strong> and <em>. Since the former or hypernyms or superset of the latter, they're never wrong, it's just that a lot of the time the latter are more precise (at the expense of sometimes being completely wrong).

(The other thing I've always wanted to change about Markdown is supporting • for list bullets.)

waldyrious commented 1 year ago

I agree with the need to express different semantics (I would particularly like to use <q> for inline quotations), but I disagree with defaulting to <i> and <b> instead of <em> and <strong>.

I think Textile has an interesting solution where a single * is used for strong, and ** for bold; same for _ for emphasis vs __ for italics.

They don't currently extend that logic to other inline elements (<del><s>, <ins><u>, <code><tt>), but I've suggested that in https://github.com/textile/textile-spec/issues/5.

I think if Djot were to adopt this system, it would be great to apply it across the board for all presentational elements that have semantic counterparts.

jgm commented 1 year ago

I don't want to use doubled delimiters; see the Beyond Markdown essay linked from the README for an explanation.

waldyrious commented 1 year ago

My bad, I had just re-read it minutes before writing my comment above, and totally agree. Somehow it skipped my mind when writing my comment. Apologies for the noise in that regard.

That said, I still believe that if non-semantic tags are to be included in Djot, implementing them with syntax that approximates the corresponding semantic variant (for example, using the same delimiter with an additional modifier, such as *!foobar!*, and maybe only allowing this within a {...} wrapper) might be preferable to coming up with a separate (yet somehow mnemonic) set of symbols for the presentational tags.

snan commented 1 year ago

On Markdown it's easy to remember: <i> for i, and <cite> for cite, and then the shortcut * or _ for the most common case, which is em. No need to use mnemonics 💁🏻‍♀️

dpk commented 1 year ago

A while ago in the CommonMark forum someone suggested "" … "" for <cite>, which I still like. I know there’s a policy against double delimiters, but the problem seems to me to specifically arise with repeated-character delimiters when the corresponding single character also has special syntactic meaning. As long as " alone doesn’t acquire a special meaning, "" should be fine.

As for <i>, the old ASCII convention of / … / around a word seems fine to me in the context of Djot, since intraword slashes wouldn’t be mistaken for intraword italics.

bpj commented 1 year ago

Unfortunately /.../ for italics would be a disaster for any linguist (like me) using djot because /.../ has a very specific special meaning in linguistics. Nobody will want to type \/...\/ all the time! I guess {/.../} might work though.

ashemedai commented 1 year ago

I would argue against the use of <b> or <i> to replace <strong> or <em> with. It might seem similar to you, but in non-Latin scripts the entire premise of either bold or italic can fall horribly flat. Take hanzi/hanja/kanji, you will never see italic here as we use it in Western-style texts, cursive script has a totally different function.

bpj commented 1 year ago

For Western text I have my opinionated thoughts about the concept of emphasis in the abstract decoupled from italic/bold/small-caps because in various fields it is the actual font styles which are imbued with various semantics, e.g. linguists use italics for object language and Romanicists use small-caps for proto-Romance words, and swapping the styles just isn't an option, because the particular font styles are standardized markup in the field. That's not really emphasis but another use of font styles. For that reason I think there should be markup for bold, italics, small-caps, underline and strikeout, but they should be separate from the markup for abstract emphasis, insertion and deletion as discussed in #10.

I have been reluctant to bring up the question of separate markup for bold and italics because I have no good suggestion for a syntax for bold, and I'm far from sold on {/italics/} — absolutely not /italics/ for the reason stated earlier in this thread. I know @jgm doesn't like doubled delimiters, and I very much agree WRT "bare" delimiters but perhaps they might work when combined with curly brackets? If so {**bold**} and {__italics__} might work since in case someone wants to put abstract emphasis inside abstract emphasis of the same kind they can use the curly brackets like {_{_em in em_}_}. This might work with {||double underline||} as per #10 too, unless {++underline++} vs. {+insertion+} and {--strikeout--} vs. {-deletion-} might work as well!

dpk commented 1 year ago

<i> is not italics and <b> is not bold.

The <i> HTML element represents a range of text that is set off from the normal text for some reason, such as idiomatic text, technical terms, taxonomical designations, among others. Historically, these have been presented using italicized type, which is the original source of the <i> naming of this element. source

The <b> HTML element is used to draw the reader's attention to the element's contents, which are not otherwise granted special importance. This was formerly known as the Boldface element, and most browsers still draw the text in boldface. source

The correct way to handle Proto-Romance terms in HTML semantics is <i lang=roa>amīcu</i> and a CSS rule such as

i:lang(roa) {
    font-style: inherit;
    font-variant: small-caps;
}

(although the convention is bad and confusing and should be replaced with the use of italics with * as in the rest of historical linguistics where possible, imo)

bpj commented 1 year ago

@dpk Read my comment again: I didn't mention <i> or <b> did I? Just "italics", "bold" etc. Presumably an HTML renderer might render those as <i> and <b> but I don't really care, because my primary concern is that my non-emphasis italics and bold are marked up differently, e.g. \textit{...} rather than \emph{...}.

And we can discuss lang tags the day they provide the granularity historical linguists really need without lang="x-very-long-tags-every-where" and text-to-speech really can render Old French properly. I can and do use spans with classes, with an appropriate granularity, e.g. .obj for object language and .graph for graphemic, and then I jump through hoops with Pandoc filters to have each rendered correctly in LaTeX, email and what have you e.g. [foo]{.graph} becoming "⟨foo⟩" or "‹foo›" depending on which characters I can expect available fonts to support in a given medium.

Off topic: As for the Romanicist small-caps convention the point of it is that the boundaries between proto-Romance, Vulgar Latin and Classical Latin are fuzzy and fussy at best and not always relevant. I simplified for the non-experts but the fact is that UĬDĒRE, *vẹdẹ̄re, uĭdēre and ‹uidere› are four different levels of decreasing abstraction, each with their proper uses, which mean four different things.

stoicon commented 4 months ago

Many rich-text editors have buttons for Bold, Italic, Underline and Strikethrough. Not for Strong, Emphasis, Inserted and Deleted. (The GitHub editor where I typed this comment also shows buttons for Bold and Italic). Therefore, many people expect to have Bold, Italic, Underline and Strikethrough in djot because it is very common.

Also, djot is not supposed to be tied to HTML. The rendering of Strong, Emphasis, Inserted and Deleted depends on the output format.

So, should we include syntax for a few common non-semantic items?

A counterargument is that not all formats support Bold, Italic, Underline and Strikethrough. Some formats may choose to display semantic items in a different way from HTML. For example, Inserted and Deleted may be shown in a different way. So, semantic items are needed.

Other examples:

So, the question is: should we include syntax for a few common non-semantic items along with the current semantic items?