openactive / modelling-opportunity-data

OpenActive Modelling Opportunity Data specification
https://www.openactive.io/modelling-opportunity-data/
Other
6 stars 6 forks source link

Formally allow formatted description #136

Open nickevansuk opened 6 years ago

nickevansuk commented 6 years ago

Proposer

ODI

Use Case

Many providers and data consumers support formatted description fields for their activities.

Why is this not covered by existing properties?

The existing modelling specification properties are restricted (and validated) strictly to plain text (e.g. no HTML tags).

Please provide a link to example data

GoodGym, Better, British Cycling, Bookwhen, Our Parks, Decathlon

Proposal

Suggest either allowing markdown format, or following Google Reserve's lead and allowing a subset of HTML tags to use.

Markdown:

Restricted HTML tags:

In either case, the formatted description should be provided as new property or type to allow for compatibility with plain text data consumers, perhaps within a structured object such as:

"description" {
   "type": "RichText",
   "format": "https://openactive.io/HTML",
   "content": "<b>Carefully designed activities and services.</b>"
}

attendeeInstructions and accessibilityInformation could also benefit from this capability.

Beta property

The property beta:formattedDescription may be used for consistent experimentation.

Google Reserve Restricted HTML

The following is taken from Google Reserve's documentation regarding the description field:

This field now supports both plain-text and HTML-like formatting rules to
display structural contents to end-users. Unlike plain text sections,
customized layouts can be created here using headings, paragraphs, lists
and some phrase tags. Please read the following instructions and notes
carefully to ensure you create the best user-experience.

Supported HTML-like formatting tags:

Heading tags: <h1>, <h2>, <h3>, <h4>, <h5>, <h6>
  Heading tags can be used to display titles and sub-titles. For example,
  <h1>Itinerary</h1> will display the inline text as the most important
  heading of the section. Note that any inner HTML tags, styles or
  attributes will be ignored. For example, <h1 style=".."> will be treated
  the same as <h1>. Only pure text wil be preserved.

Paragraph tag: <p>:
  The paragraph tag can be used to highlight a detailed introduction or
  contents. Any inner tags, styles or attributes will be ignored, with a
  few exceptions: <br>, <strong> and <em>. Please see the phrase tag
  section below for more details.

List tags: <ul>, <ol>, <li>
  The <ul> tag can be used with the <li> tag to display unordered lists,
  and the <ol> tag can be used with <li> to display ordered lists. This is
  a good way to display checklists, highlights, or any other lists that fit
  your use-cases.
Example: To show a list of features of a cruise trip:
  <ol>
    <li>Wonderful ocean view and chances to play with wildlife.</li>
    <li>Carefully designed travel arrangements and services.</li>
    <li>Gauranteed lowest price.</li>
  </ol>
Note that only <li> children under <ul> or <ol> tags will be converted. All
other children will be dropped. Also, any inner tags, attributes and styles
will be ignored; we only preserve pure text contents.

Division tag: <div>
  All supported inner tags of the <div> tag will be parsed with the rules
  stated above, imply <div> tag itself does not mean any grouping or
  indenting here. Also, any inner attributes and styles will be ignored.

Phrase tags: <br>, <strong>, <em>:
  Only the three tags mentioned above are supported. <br> can be used to
  break lines in paragraphs, and <strong>/<em> can be used to highlight
  important text. Any other phrase tags will be ignored.

Unsupported tags:
  * <html>, <header>, and <body> tags are not allowed.
  * Any other tags not mentioned above are not supported (for example
    <table>, <td> ...).
Any URLs, anchors, and links will be stripped, and will never be displayed
to end-users. If you want to use photos to create a rich user experience,
please use the "related_media" field below to send your photo URLs.

Important notes:
  * Try not to use other tags except for the supported ones mentioned
    above, because the contents within unsupported tags will be stripped,
    and may lead to an undesirable user experience.
  * Try avoid deep nested structures like more than 3 different heading
    levels or nested lists. Keeping the structure flat, simple, and
    straightforward, helps to create a better user experience.
  * If the currently supported layouts are not sufficient for your use
    cases, please reach out to the Reserve with Google team.
nickevansuk commented 5 years ago

Proposed way forward

Use "Restricted HTML tags", as it is widely supported by publishers, and copy Google's subset of tags into our spec.

Define type RichText as above.

ldodds commented 5 years ago

My comments remain much the same as: https://github.com/openactive/ns-beta/issues/2#issuecomment-309817581

We have multiple text properties, are we proposing to move all of them to a structured format instead that clarifies format, or duplicating all of them to allow Rich Text? The former is a breaking change

How does the range of markup being used in the beta users tally against Google suggestions? Do tags cover intended uses or are people using a broader set (or is there a smaller subset).

What are the conformance rules if we make this change? For widest possible use, I'd suggest a plain text version MUST always be required. A consumer MAY choose to display a formatted version. Publishers SHOULD NOT expect consumers to display the formatted version.

What are the implications for data entry? We can recommend a set of tags and validate against it, but that will require publishers to ensure they're sanitising input upstream in whatever interface they provide.

Data consumers will need to sanitise too to avoid any potential problems. It shouldn't be safe to simply display formatted text without passing through a filter, so there is an implementation cost?

abs0 commented 4 years ago

Following up on OpenActive community call today.

There was general consensus on the following:

a. Adding formatted versions for all fields where markup could be of use would be ugly b. Extending existing fields to include a format would be an unwanted breaking change c. Repurposing existing plain text fields to allow markup would only be viable if existing consumers can continue to display them as plain text without significant impact

There are two distinct cases for markup usage

  1. The main description - there is a single entry per item, and a formatted version could be expected to showcase the item, so could have multiple sections, sub headings and a reasonably rich set of markup
  2. Any other text field which could extend beyond a single paragraph. Many of these could be presented on a single page under page defined labels, so including headings and other richer markup could impair the overall layout, but there are cases where some limited structural markup, such as breaking on paragraphs and including bullet lists would already help existing data

For the main description following the existing pattern of html formattedDescription seems to work well.

For other plain text fields which could contain more than a single line of data we could look to moving towards a subset of markdown.

The simplest change would be to recognise a blank line as a paragraph marker. This would allow producers to format data into paragraphs - updated consumers can benefit from the better layout, while the fallback for unmodified consumers will display the data exactly as now.

The potential next change would be to recognise markdown bullet point formatting. The likelyhood of existing data including '' or '-' at the start of lines without having intended the data to be shown as a list is extremely low, and again the fallback for unmodified consumers would be to show the items as a long block of text with a ' ' or ' - ' between each item, which is similar to how some producers are formatting data now (using ';' or ',').

Even if both changes are implemented, consumers can continue to treat the fields as plain text with minimal impact, apart from having long fields run all the sentences together in one block of text, which is already happening with data now.

Screen readers and other tools could also use the limited structural markup to make it easy to skip forward/back a paragraph or indicate the bulleted lists

Potential further option:

Additionally a recommendation could be made that producers avoid including other markdown tags in the text fields, and that consumers implement a full markdown parser and strip out unwanted tags (which could allow for potential future extension).

thill-odi commented 4 years ago

My only hesitation with the proposal to define a strictly limited subset of markdown (paragraphs and lists) for use in some fields regards accessibility. My understanding (rather dated now) is that screen-readers often work best with HTML; I'm not sure how/if they cope with markdown at all.

thill-odi commented 4 years ago

Looking at https://johnmacfarlane.net/babelmark2/, it appears that all flavours of markdown render paragraph breaks identically - the only difference is the amount of whitespace between HTML elements, which disappears in rendering.

Bulleted lists likewise render near-identically, although one or two flavours insert whitespace between each item.

Numbered lists are more problematic. First, markdown allows these to be indicated by lines starting with any digit (so that lists of the form

1.
1.
1.

are frequent in markdown, but would render sub-optimally in a plaintext parser.

Second, there may be questions of expected behaviour regarding list continuation in situations such as

1. Item one
1. Item two
1. Item three
Here is an interruption
1. Here the list resumes
1. Or does it?

I'd thus propose supporting markdown only for paragraph breaks and bulleted lists, to avoid ambiguities of this kind.

abs0 commented 4 years ago

My only hesitation with the proposal to define a strictly limited subset of markdown (paragraphs and lists) for use in some fields regards accessibility. My understanding (rather dated now) is that screen-readers often work best with HTML; I'm not sure how/if they cope with markdown at all.

My expectation would be that the relevant fields would probably be presented in aggregate rather than as a single page of "here is the {foo} field for {bar}". As such the system processing the data would either need to interpret the markdown and format appropriately, or (as a fallback) pass it through as plain text.

I'm my assumptions are valid :) a screen reader would not be expected to parse the markdown itself, as it would not be consuming the feed directly

abs0 commented 4 years ago

[issue with numbered list formatting]

I'd thus propose supporting markdown only for paragraph breaks and bulleted lists, to avoid ambiguities of this kind.

I would definitely concur with supporting markdown only for paragraph breaks and bulleted lists initially. My only question would be should there be a note that extending markdown might be reviewed in future, so it would be recommended that consumers consider handling numeric lists and certain other features) (cue bikeshed to define that feature set)

thill-odi commented 4 years ago

I'm my assumptions are valid :) a screen reader would not be expected to parse the markdown itself, as it would not be consuming the feed directly

Good point - indeed, screen-readers will presumably be reading the rendered rather than 'raw' form.

My only question would be should there be a note that extending markdown might be reviewed in future, so it would be recommended that consumers consider handling numeric lists and certain other features) (cue bikeshed to define that feature set)

Yes, that's a happy path. And I think we can leave the bikeshedding for the future :-).