Open trwnh opened 7 hours ago
The document already has a similar structure. It has 3 top-level types of discovery:
For each, there is a sub-section for starting with an URL, and another sub-section for starting with a document (like in a browser, or when an object is delivered via the ActivityPub protocol). Each of the applicable techniques is then listed in its own sub-sub-section.
Each sub-sub-section describes the technique in detail, gives an example or two, and then lists out some ways it can fail.
The exception is for author discovery. Going from resource HTML to author ActivityPub can be direct, or it can pass through the resource ActivityPub or the author HTML. Each of those other paths has its own sub-section and has examples.
There's a paragraph in the introduction on how to switch from an URL to a document, which is trivial except in the case of an HTML document, in which case you'll probably need some contextual information such as from the browser environment.
I appreciate the offer, but it took a long time to get to this level of clarity, and I don't think reorganization at this point would be helpful.
Respectfully, looking at the current report structure, I don't see the level of clarity that you're seeing. I filed this issue because I found it sufficiently confusing. It would be a lot clearer if each section was devoted to a specific goal, and then the subsections were devoted to how to accomplish that goal.
In particular, the "URL as input" and "Document as input" distinction doesn't clearly fit into any of the current sections. You can see this in how the current structure actually duplicates and repeats information:
The key point that arises here is that "URL" is actually a separate class of discovery entirely. Given an arbitrary HTTP(S) URL, you don't know whether the resource at that URL is HTML or AS2. This means that "HTTP(S) URL" ends up having its own considerations before you even get to the part where you have HTML or AS2. It's the sort of "Step 0" in the discovery process, as well as also being an intermediate step for several other discovery processes, like taking a link href
of unknown or mismatched type and being expected to do... something? with it.
Separating out the URL considerations from the HTML/AS2 considerations significantly simplifies and clarifies the overall structure of the report. Leaving it mixed in with the document considerations is creating a lot of the confusion, because as previously pointed out, a URL is not known to be HTML or AS2 until you actually try to do something with it. It generally doesn't make sense to talk about an "HTML URL" or an "AS2 URL"; the reality is that it is an "HTTP(S) URL" instead.
Doing this kind of restructuring also reduces the complexity because you don't need as many levels of nesting to represent the same information. You can eliminate "Document as input" because it is redundant with your starting point already being an HTML document or an AS2 document. A lot of current headings are 4 levels deep... The report could be mostly 2 levels deep and occasionally dipping into 3.
The key point that arises here is that "URL" is actually a separate class of discovery entirely. Given an arbitrary HTTP(S) URL, you don't know whether the resource at that URL is HTML or AS2.
That's actually not true; you'll often know from context. You know that the URL is for AS2 if it's ActivityPub id; you know that it's an HTML URL if you get it from document.location
within a browser environment.
The purpose of the document is listed at the top: discovery of HTML from ActivityPub, ActivityPub from HTML, and ActivityPub author from HTML. It's not about general discovery on arbitrary URLs.
I think we could add a section on doing discovery when you don't know what kind of URL you have and what direction of discovery you're doing, though. I'm trying to think of a user story, though. Do you have any ideas?
I also think it's useful to be specific on the direction of discovery. Yes, discovering the HTML from ActivityPub and ActivityPub from HTML is practically the same when you use the Link
method, but I really think the text and examples are clearer if each section starts from the purpose (HTML URL -> ActivityPub URL or vice versa) and stays focused on that purpose. It's problem-first, not solution-first.
you'll often know from context
If you have context, you're already further along in the discovery process and have already arrived at a document of some sort.
I sympathize that the report's main focus is on HTML <-> AS2, but the fact that HTTPS URLs are a significant intermediate step means that they would benefit from having a section like the introduction. It's not completely irrelevant, either -- there are 3 discovery methods associated with HTTPS URLs that are "pre-requisite information", especially when you land upon a URL as part of the discovery process (e.g. via rel=author or rel=alternate without type= being specified.)
It's problem-first, not solution-first.
This is what I'm trying to say as well --
Given {"an http(s) URL", "an HTML document", "an AS2 document"} for a resource, I want to discover {"an HTML representation", "an AS2 representation"}.
Given {"an http(s) URL", "an HTML document", "an AS2 document"} for a resource, I want to discover the author as {"an HTML representation", "an AS2 representation"}.
Essentially, if you want to describe HTTP HEAD, HTTP Accept, or WebFinger, then you need to do this in the context of an URL. Otherwise, the alternative I'd propose is removing "URL as input" entirely from the report. But this seems like useful information to have on hand, so this is why I'm advocating instead to keep it, but promote it to a top-level section. Sure, it's a rehash of what's in ActivityPub or what's in RFC 8288 Web Linking or what's in RFC 7033 WebFinger, but it has illustrative value to the reader rather than forcing them to open 3 other auxiliary documents to get the information they need.
From #33 and https://github.com/swicg/activitypub-html-discovery/issues/33#issuecomment-2480722239
section hierarchy should generally follow the expected user flow
section hierarchy should also place related topics or branches at the same level, instead of mixing concerns
general notes
this is kind of like a state transition graph or DFA (discrete finite automata) where you have the following rough connections
note that, depending on the discovery method, the latter 2 might pass intermediately through "http(s) URL" again. for example getting the
url
orattributedTo
, or using the target of a Link header for conneg where the type is not specified.possible structure
hence:
<link>
tag or<a>
tag)<script>
tag)<link rel="self">
<link rel="canonical">
<base>
url
url
is a Link withmediaType
oftext/html
(explicit or assumed default)url
is a Link with amediaType
that is nottext/html
<link>
tag or<a>
tag)<meta>
tag)fediverse:creator
tagacct:
URI then GO TO: Via resource descriptor (WebFinger)expand or build upon this structure as needed to fill in the rest of the report