Rich text representation in JSON

dspreadbury commented 3 months ago

Issue #280 considers whether we could use HTML/XHTML to encode rich text in MNX documents, but it dates from times of yore when MNX was an XML-based format. We don't think it would be a great idea to use HTML/XHTML in our JSON-based MNX format, so we instead need to consider how to approach encoding rich text in a more JSON-friendly way.

Typically scores use a single text font family and a single music font family (though of course there are exceptions), so we'll want to define the default font families to be used for the document as a whole in the global presentation data for the document. It shouldn't be necessary to specify the font to be used for every text item, unless it needs to be overridden. Similarly, when specifying the use of a SMuFL symbol, it shouldn't be necessary to specify the music font to be used, unless it needs to be overridden.

Text in scores is typically either roman, italic, bold, or bold italic. Rarely it may be underlined, even more rarely it may be overlined, and almost never is it struck through. It is more common for text to be enclosed in a box than to be underlined. Our text representation should make the common kinds of text used in scores easy to represent, while providing some flexibility for more unusual use cases.

JSON documents use UTF-8, so it makes sense that runs of text should be encoded in UTF-8. All characters within the Basic Multilingual Plane (U+0000–U+FFFF) can be encoded using UTF-8; characters outside the BMP must be encoded as UTF-8 surrogate pairs (or so says Wikipedia, anyway). New line characters can be encoded using \n.

In order to allow rich text formatting within a run of text, one approach would be to have a text object that contains one or more textChunk objects. Each textChunk defines either a string or a SMuFL glyph, and optionally can define overrides for font family, font style, font size, decoration (enumeration? for underline, overline, strikethrough), and enclosure (enumeration? for border). To change any of the formatting properties for text, a new textChunk object is required, and everything contained within the same text object is intended to be rendered as a single run of text. If the run of text is multi-line, a textChunk can be terminated with a new line (\n).

There are still lots of questions to resolve here:

Are there any existing rich text representations for JSON documents that we could use, instead of defining our own? (They would need to be both sufficiently lightweight not to overburden MNX documents with unnecessary guff, and sufficiently flexible to accommodate SMuFL glyphs as first-class citizens.)
What considerations need to be made for complex scripts, such as RTL scripts? (In theory anything within the BMP can be represented simply using UTF-8, but do we need to provide rendering hints to specify that this textChunk needs to be rendered right to left?)
Can we get away with defining a simple enumeration for font style (roman, bold, italic, bold italic) or do we need to accommodate the effectively infinite number of styles provided for fonts by some operating systems? (For example, a font like "Minion Pro Condensed" will be considered to be of font family "Minion Pro" with style "Condensed" on macOS, but on Windows, it will be considered to be of font family "Minion Pro Condensed" with style "Regular" on Windows.)

akulisch commented 3 months ago

Regarding UTF-8 Encoding: Surrogate Pairs are a UTF-16 concept, so you mixed UTF-8 and UTF-16. UTF-8 can encode all of Unicode. U+0000 – U+007F (ASCII) as one byte each, everything from U+0100 upwards as multibyte sequences.

lemzwerg commented 3 months ago

Surrogate pairs are only needed for escaped characters. Honestly, I think this is a very ugly limitation of JSON since it is next to impossible to deduce visually that the representation \uD83D\uDE10 is actually U+1F610.

If it were possible to extend the JSON for MNX I would suggest to either introduce \u{...} or \U... allowing for more than four hex digits so that the whole Unicode range can be represented with a single escape instead of surrogate pairs. However, I guess this is a pipe dream since all the JSON parsers out there would choke on that...

lemzwerg commented 3 months ago

Regarding your questions on font styles (wearing my FreeType maintainer hat):

IMHO we can not get away with the classical four text style attributes. BTW, I think that your observations on the font naming details on MacOS and Windows are dependent on applications and/or UI features that do not implement the current OpenType standard, mostly for backward-compatibility reasons.

Please check the 'name' table documentation and look how 'Name IDs' are constructed. The examples there explicitly mention Minion Pro, BTW.

It might be helpful to examine how the Pango font rendering library (which is widely used in the Unix world) implements both text attributes and font descriptions. There are certainly other libraries that provide similar features.

samuelbradshaw commented 3 weeks ago

There are three patterns I've seen for styling text in code:

Block-level styles; i.e. styling a full object (such as a syllable, or a title); which might include breaking a single text block into a list of separate blocks for styling as described above
Special tags or syntax that surrounds the character(s) being styled
Styling instructions stored separately from the character(s) being styled (example: make characters 5–7 bold, and make characters 6–10 italic)

(1) isn't very flexible if text objects are defined as strings. If all text objects are defined as lists, it becomes more flexible, but is also very verbose. (3) is very flexible (especially when it comes to overlapping styles, which neither (1) nor (2) handles cleanly), but a pain to maintain (if you add a character to the text, you have to update all of the instructions to account for the change).

(2) balances brevity and maintainability with flexibility. I think these are the most common open-source flavors:

XML or HTML tags, of three types:
- Tag itself defines the style (text, text)
- Tag has attributes that define the style (text or <format color="red">text</format>)
- Tag has class(es) or an ID that allow the style to be defined elsewhere (text)
BBCode tags (https://www.bbcode.org/how-to-use-bbcode-a-complete-guide.php)
Markdown syntax (https://www.markdownguide.org/basic-syntax/)
Wikitext syntax (https://en.wikipedia.org/wiki/Help:Wikitext)
LaTeX syntax (https://www.learnlatex.org/tr/lesson-11)

As of now, I'm still in favor of (2) for inline styling, and I lean towards something with tags like XML/HTML or BBCode.

w3c / mnx

Rich text representation in JSON #345