readium / annotations

Model and JSON serialization of annotations associated with EPUB and Web Publications
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Why only plain text body and not rich text body? #11

Closed lrosenthol closed 1 month ago

lrosenthol commented 1 month ago

The proposal only supports plain text comment bodies, but many users, in common commenting/annotation scenarios, tend to apply basic styling - bold/italic, hyperlinks, etc.

I would recommend that you also allow for rich text bodies as well.

llemeurfr commented 1 month ago

The W3C Web Annotation Model open the door to body values that are not plain text (https://www.w3.org/TR/annotation-model/#embedded-textual-body)

but rich text is a can of worms:

Notes, in Adobe Acrobat, are not rich text. Even MS Word - a tool dedicated to text editing - does not allow lots of features in comments (e.g. no hyperlinks).

This is why the current spec constrains the body to a String Body (https://www.w3.org/TR/annotation-model/#string-body). By the way I should make it clearer in the spec.

As any spec, it is better IMO to start simple, and the extensibility of the structure allows to support rich text (html?) later with the addition of a "format" property.

lrosenthol commented 1 month ago

Comments/Annotations in PDF have supported RichText for over 20 years, and you can certainly author them using Adobe Acrobat and other tools. Yes, there are limits (for example, it supports lists, but not tables) - but at least for PDF, they have held up fairly well...only thing we had to add over the years was hyperlinks (where link value != link text).

Again, I don't know if this needs to be a v1 requirement, but certainly something to track.

BigBlueHat commented 1 month ago

@lrosenthol curios what format the Acrobat rich text comments use.

Picking a format is truly a can of worms. I still feel we need a "safe subset" of HTML (like Markdown provides, but less limited). Picking one seems to pivot on expected use cases for consumption as much as authorship.

lrosenthol commented 1 month ago

curios what format the Acrobat rich text comments use.

A subset of XHTML called XFA. I wouldn't recommend it, but it was created decades ago and we're stuck with it. (but it does serve the purpose).

Picking a format is truly a can of worms

No argument there! A safe subset of HTML or CommonMark/Markdown would be the logical choices.

llemeurfr commented 1 month ago

Let's move this issue to a discussion for later study.

For the short term, W3C annotations support a format property, i.e. the media-type of the annotation value. Let's adopt it in our profile, as an optional property with default value 'plain/text'. Other values may be adopted later for supporting rich text body.