Open trwnh opened 1 month ago
I'm not seeing any justification in https://github.com/mastodon/mastodon/pull/32538 for why any content type other than HTML would be useful or preferred. I don't think Claire is expressing any sort of preference for a markdown summary type, just saying that she initially misunderstood its type.
Adding the requirement to have to process different mime types would have made the PR much more complicated, anyway, than just adding HTML sanitization, since in any case—regardless of what you're producing—you'll have to handle incoming HTML.
this just feels like overcomplicating the spec for no additional benefit. I'm still not convinced there's even a good justification for allowing content
to be different media types—are there any major non-HTML implementations?
This issue has been labelled as potentially needing a FEP, and contributors are welcome to submit a FEP on the topic. Note that issues may be closed without the FEP being created; that does not mean that the FEP is no longer needed.
So, I think the problem with adding flags to indicate that the summary is not HTML is that it's not backwards compatible; consumers will expect summary
to always be HTML as documented.
I agree that a primer page makes sense.
I'd also suggest a FEP for defining a new description
or other property that can have different media types. Using a new property instead of summary
allows us to define new semantics for that property, that aren't encumbered with the pretty strict requirement that summary
be HTML.
One thing about the primer page is that there is the question of when an object does not have a name
and should have a summary
without HTML. Not all plain text is valid HTML; for example, text that uses unescaped characters that are meaningful in HTML like <>'"
.
why any content type other than HTML would be useful or preferred.
minimal example where HTML parsing is destructive:
{
"summary": "I am trying to serialize the RDF statement <Alice> <knows> <bob> into plain-text, but a naive HTML sanitizer is stripping the statement completely"
}
{
"summary": "I am trying to serialize the RDF statement into plain-text, but a naive HTML sanitizer is stripping the statement completely"
}
the workaround is to HTML-escape the angle brackets which might not be unescaped by every consumer
Description of issue
name
is defined as "A simple, human-readable, plain-text name for the object. HTML markup MUST NOT be included."summary
is defined as "A natural language summarization of the object encoded as HTML."content
includes in its definition that "By default, the value of content is HTML. ThemediaType
property can be used in the object to indicate a different content type."So to synthesize these three definitions:
name
is alwaystext/plain
content
is whatever the value ofmediaType
is, wheremediaType
defaults totext/html
summary
is alwaystext/html
?But there are cases where a producer might want to signal a different content type for
summary
; for example,text/plain
ortext/markdown
. Recently, https://github.com/mastodon/mastodon/pull/32538 came up as an example of wanting to produce asummary
that is NOTtext/html
. So the question is, might it make sense to provide a mechanism for declaring thatsummary
is something other thantext/html
?Potential solutions
mediaType
to cover bothcontent
andsummary
could work, but would prevent using different formats for each of the two separately.mediaTypeOfSummary
seems clunky, but might end up making sense or being necessary.@value
can have its ownmediaType
, although this would be pretty complicated and not backwards-compatible. (It would also break JSON-LD language containers, sonameMap
/summaryMap
/contentMap
would not work.)Action items
summary
to be something other thantext/html
, with eithermediaType
extending to cover it, or definingmediaTypeOfSummary
as an analogous property.