jgm / djot.js

JavaScript implementation of djot
MIT License
141 stars 15 forks source link

"I don't" gets parsed as smart_punctuation right_single_quote #52

Closed shi-yan closed 1 year ago

shi-yan commented 1 year ago

I have the following content:

I don't xxxxxx

The single quote gets parsed into

tag: "smart_punctuation" type: "right_single_quote"

is this correct? I expect this to be parsed as a single text block.

jgm commented 1 year ago

Yes, this is intentional. The idea is that the renderer can decide whether to render this as a straight quote or a curly quote.

andersk commented 1 year ago

U+2019 RIGHT SINGLE QUOTATION MARK is the typographically correct character to use for an apostrophe. See the subsection “Apostrophes” in the Unicode standard, §6.2 General Punctuation:

Apostrophes

U+0027 APOSTROPHE is the most commonly used character for apostrophe. For historical reasons, U+0027 is a particularly overloaded character. In ASCII, it is used to represent a punctuation mark (such as right single quotation mark, left single quotation mark, apostrophe punctuation, vertical line, or prime) or a modifier letter (such as apostrophe modifier or acute accent). Punctuation marks generally break words; modifier letters generally are considered part of a word.

When text is set, U+2019 RIGHT SINGLE QUOTATION MARK is preferred as apostrophe, but only U+0027 is present on most keyboards. Software commonly offers a facility for automatically converting the U+0027 APOSTROPHE to a contextually selected curly quotation glyph. In these systems, a U+0027 in the data stream is always represented as a straight vertical line and can never represent a curly apostrophe or a right quotation mark.

Punctuation Apostrophe. U+2019 RIGHT SINGLE QUOTATION MARK is preferred where the character is to represent a punctuation mark, as for contractions: “We’ve been here before.” In this latter case, U+2019 is also referred to as a punctuation apostrophe.

See also my Quora answer about this with more context.