Open nickevansuk opened 6 years ago
Use "Restricted HTML tags", as it is widely supported by publishers, and copy Google's subset of tags into our spec.
Define type RichText
as above.
My comments remain much the same as: https://github.com/openactive/ns-beta/issues/2#issuecomment-309817581
We have multiple text properties, are we proposing to move all of them to a structured format instead that clarifies format, or duplicating all of them to allow Rich Text? The former is a breaking change
How does the range of markup being used in the beta
users tally against Google suggestions? Do tags cover intended uses or are people using a broader set (or is there a smaller subset).
What are the conformance rules if we make this change? For widest possible use, I'd suggest a plain text version MUST always be required. A consumer MAY choose to display a formatted version. Publishers SHOULD NOT expect consumers to display the formatted version.
What are the implications for data entry? We can recommend a set of tags and validate against it, but that will require publishers to ensure they're sanitising input upstream in whatever interface they provide.
Data consumers will need to sanitise too to avoid any potential problems. It shouldn't be safe to simply display formatted text without passing through a filter, so there is an implementation cost?
Following up on OpenActive community call today.
There was general consensus on the following:
a. Adding formatted versions for all fields where markup could be of use would be ugly b. Extending existing fields to include a format would be an unwanted breaking change c. Repurposing existing plain text fields to allow markup would only be viable if existing consumers can continue to display them as plain text without significant impact
There are two distinct cases for markup usage
For the main description following the existing pattern of html formattedDescription seems to work well.
For other plain text fields which could contain more than a single line of data we could look to moving towards a subset of markdown.
The simplest change would be to recognise a blank line as a paragraph marker. This would allow producers to format data into paragraphs - updated consumers can benefit from the better layout, while the fallback for unmodified consumers will display the data exactly as now.
The potential next change would be to recognise markdown bullet point formatting. The likelyhood of existing data including '' or '-' at the start of lines without having intended the data to be shown as a list is extremely low, and again the fallback for unmodified consumers would be to show the items as a long block of text with a ' ' or ' - ' between each item, which is similar to how some producers are formatting data now (using ';' or ',').
Even if both changes are implemented, consumers can continue to treat the fields as plain text with minimal impact, apart from having long fields run all the sentences together in one block of text, which is already happening with data now.
Screen readers and other tools could also use the limited structural markup to make it easy to skip forward/back a paragraph or indicate the bulleted lists
Potential further option:
Additionally a recommendation could be made that producers avoid including other markdown tags in the text fields, and that consumers implement a full markdown parser and strip out unwanted tags (which could allow for potential future extension).
My only hesitation with the proposal to define a strictly limited subset of markdown (paragraphs and lists) for use in some fields regards accessibility. My understanding (rather dated now) is that screen-readers often work best with HTML; I'm not sure how/if they cope with markdown at all.
Looking at https://johnmacfarlane.net/babelmark2/, it appears that all flavours of markdown render paragraph breaks identically - the only difference is the amount of whitespace between HTML elements, which disappears in rendering.
Bulleted lists likewise render near-identically, although one or two flavours insert whitespace between each item.
Numbered lists are more problematic. First, markdown allows these to be indicated by lines starting with any digit (so that lists of the form
1.
1.
1.
are frequent in markdown, but would render sub-optimally in a plaintext parser.
Second, there may be questions of expected behaviour regarding list continuation in situations such as
1. Item one
1. Item two
1. Item three
Here is an interruption
1. Here the list resumes
1. Or does it?
I'd thus propose supporting markdown only for paragraph breaks and bulleted lists, to avoid ambiguities of this kind.
My only hesitation with the proposal to define a strictly limited subset of markdown (paragraphs and lists) for use in some fields regards accessibility. My understanding (rather dated now) is that screen-readers often work best with HTML; I'm not sure how/if they cope with markdown at all.
My expectation would be that the relevant fields would probably be presented in aggregate rather than as a single page of "here is the {foo} field for {bar}". As such the system processing the data would either need to interpret the markdown and format appropriately, or (as a fallback) pass it through as plain text.
I'm my assumptions are valid :) a screen reader would not be expected to parse the markdown itself, as it would not be consuming the feed directly
[issue with numbered list formatting]
I'd thus propose supporting markdown only for paragraph breaks and bulleted lists, to avoid ambiguities of this kind.
I would definitely concur with supporting markdown only for paragraph breaks and bulleted lists initially. My only question would be should there be a note that extending markdown might be reviewed in future, so it would be recommended that consumers consider handling numeric lists and certain other features) (cue bikeshed to define that feature set)
I'm my assumptions are valid :) a screen reader would not be expected to parse the markdown itself, as it would not be consuming the feed directly
Good point - indeed, screen-readers will presumably be reading the rendered rather than 'raw' form.
My only question would be should there be a note that extending markdown might be reviewed in future, so it would be recommended that consumers consider handling numeric lists and certain other features) (cue bikeshed to define that feature set)
Yes, that's a happy path. And I think we can leave the bikeshedding for the future :-).
Proposer
ODI
Use Case
Many providers and data consumers support formatted description fields for their activities.
Why is this not covered by existing properties?
The existing modelling specification properties are restricted (and validated) strictly to plain text (e.g. no HTML tags).
Please provide a link to example data
GoodGym, Better, British Cycling, Bookwhen, Our Parks, Decathlon
Proposal
Suggest either allowing markdown format, or following Google Reserve's lead and allowing a subset of HTML tags to use.
Markdown:
Restricted HTML tags:
beta:formattedDescription
output HTML.In either case, the formatted description should be provided as new property or type to allow for compatibility with plain text data consumers, perhaps within a structured object such as:
attendeeInstructions
andaccessibilityInformation
could also benefit from this capability.Beta property
The property
beta:formattedDescription
may be used for consistent experimentation.Google Reserve Restricted HTML
The following is taken from Google Reserve's documentation regarding the
description
field: