Open evanp opened 10 months ago
I made this issue for ActivityPub rather than Activity Streams 2.0 because of the need for sharing HTML between untrusted partners. Other uses for AS2, such as archiving, might use a different HTML profile.
I'd hesitate to ever make this normative, but guidance or a part of a profile could definitely be helpful.
In the thread you mention that you would like to see more use of Article
. Question here is if that object type has a richer subset as a recommendation. It would make sense to me, e.g. like a Note
subset does not support/recommend headings, but an Article
does, and maybe a whole range of semantic html tags to format articles with.
I'd hesitate to ever make this normative, but guidance or a part of a profile could definitely be helpful.
@snarfed, why not have a normative for this? I think it would be easier for developers to know what is expected to find in the content of each type of object.
Should the HTML elements allowed for the content of each type of object be the same, or should they vary, for example, between an Article
and a Note
? In RDFS the range of a property, like content
, does not depend on the subject type, but on the property itself. So, maybe there is a need to consider different of elements by object type, but this contradicts the RDFS design of property ranges.
In the thread, @evanp says that the Article
object was down scaled to a Note
object. Also, in this transformation, some elements as <h2>
were replaced by other elements as <strong>
. This suggests that an Article
and a Note
may allow different sets of elements.
It is the set of elements what we want to restrict or also the way they can be combined. Recall that HTML also restricts how elements can be nested.
@snarfed, why not have a normative for this? I think it would be easier for developers to know what is expected to find in the content of each type of object.
It would definitely be easier! A profile can definitely help with that. Enshrining it into the normative core spec feels too heavy handed to me though:
In addition to the reasons Ryan gave (which are all very good reasons), I think there's a more fundamental one, which is that the ActivityPub spec is built to be open to extension. Specifying a normative list of "allowed" HTML tags or attributes would make it impossible for implementations to extend the types of content their users are allowed to publish.
In effect, such a restriction would have no value, since developers would just violate it any time they needed to support a new type of novel content (For example, there's already an FEP for potential MathML support. Such an FEP would violate the core spec if we added such a restriction). Instead, a whitelist would only serve to produce a spec that is not followed in the real world and therefore would defeat the purpose of specification.
On Thu, Jan 18, 2024, 1:43 PM Ryan Barrett @.***> wrote:
@snarfed https://github.com/snarfed, why not have a normative for this? I think it would be easier for developers to know what is expected to find in the content of each type of object.
It would definitely be easier! A profile can definitely help with that. Enshrining it into the normative core spec feels too heavy handed to me though:
- Implementations will still receive content with HTML tags outside the allowlist, along with HTML that doesn't validate, non-HTML, and even binary, due to bugs, old implementations, attacks, etc. Developers will still need to sanitize incoming activities.
- Implementations vary in whether/how they can handle or render HTML at all. Most will still need to choose their own set of tags, if any, to sanitize down to.
- There's a mature existing ecosystem of web apps, CMSes, and related tools that emit HTML content and are gradually adopting ActivityPub. It'd be prohibitive to require all of them to change their output markup.
- The web and HTML evolve over time. It'd be nice to support that gracefully and not lock out new technologies like web components that use novel tags.
- Attacks also evolve over time. Tags and markup that are "safe" now may not stay that way. It'd be nice to be agile and address those changes quickly, in guidance or profiles, instead of waiting years for normative spec updates.
— Reply to this email directly, view it on GitHub https://github.com/w3c/activitypub/issues/419#issuecomment-1899100520, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZCV65B3ZPV6TUNIOHIE3YPF3OHAVCNFSM6AAAAABB7CEL5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZGEYDANJSGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
As a start to this process, I'm going to add a page to the ActivityPub Primer with guidance on the best practices for each of the properties summary
and content
: https://www.w3.org/wiki/ActivityPub/Primer/HTML
I've started the document, but there's still a lot to do. I'm going to self-assign and come back to this in the near future.
Daniel Hernandez asked an interesting question about the HTML content of Activity Streams 2.0 objects.
The only two properties that can contain HTML markup are
summary
andcontent
.Mastodon has documentation on HTML sanitation giving the elements and attributes it supports.
Does it make sense to write additional documentation for this for the entire network?