Closed mattgarrish closed 5 months ago
What happens to a resource that has a URL but is not http(s)? There is a plethora of such thing, like, for example, doi:
. Is that up to the RS whether that is considered as valid?
A doi wouldn't refer to a publication resource, would it? Publication resources are things in the manifest used in rendering the publication. They're fine in metadata, etc.
In any case, you could refer to a resource obtained via ftp, for example, but I'd still think that'd have to be web based, at least in the general sense of accessing via the internet. The likelihood of reading systems supporting any references that aren't http/https is probably approaching zero, if not zero (or of authors using them). We don't require reading systems to support remote resources or any protocols for obtaining them. The only thing we do is ban file:
We don't require reading systems to support remote resources or any protocols for obtaining them.
Scratch that. We don't require it, but we recommend that they only support remote resources via https.
https://www.w3.org/TR/epub-rs-33/#sec-epub-rs-conf-remote-res
I did not remember the explicit recommendation on http(s)... But it is not banning non https resources.
I realize doi:
is a bit convoluted, because, in practice, doi: URLs usually redirect to a Web resource. So it is (eventually) on the Web, just in a roundabout way. But ftp:
is a good example; we can also refer to did:
(although the probability of a did being used in an ebook is certainly close to zero).
That being said, what is now in the spec is not wrong and, paired with the recommendation for the RS, is also reasonably secure. I do not think there is any harm in leaving it as is...
I agree with @iherman - there is no harm leaving it, and changing it to i.e. seems like a new requirement. Does "i.e. on the web" mean "MUST be on the web"? And then what does "on the web" mean? Is it a protocol thing (MUST use http(s)) or an access thing (MUST be available for public download)? I think that is a can or worms I would rather not open.
Is it a protocol thing (MUST use http(s))
Isn't that what we've turned it into by recommending https for remote resources, though? In a world where warnings are like requirements, we've already kind of cut people off from using anything else.
I am not sure I agree with the statement that warnings are essentially requirements. In a previous job I held, there was an explicit allowlist of outright errors that we accepted from epubcheck because they were so prevalent, and all warnings were allowed. But "MUST use https" seems like a new requirement, and not the same as "on the web". For instance, a document available only on a corporate intranet might be accessible via https, but it doesn't seem like it is "on the web." If we want to make the change, I think it should be done by changing the SHOULD to a MUST in https://www.w3.org/TR/epub-rs-33/#sec-epub-rs-conf-remote-res though I don't support that change at the moment, mainly because I am not sure what real world problem it is fixing.
But "MUST use https" seems like a new requirement
Sure, but I think we're drifting from why I opened this.
I can agree that "i.e." is perhaps too strong a wording, although this feels like we're accommodating scenarios that are theoretically possible but never done in practice. When I re-read the definition, the contrast of a resource not being on the web is that it is available offline, connecting it to our allowing file: protocol for remote resources in the past.
Can we at least take out the "not necessarily" as redundant with "typically" and link this definition to the remote resources section for clarity:
A publication resource that is located outside of the EPUB container, typically on the web.
Refer to 3.6 Resource location for more information.
I'd also like to see a note in 3.6 that refers to the ban on the file protocol. There are a lot of jumps you have to make to connect the pieces together right now.
Can we at least take out the "not necessarily" as redundant with "typically" and link this definition to the remote resources section for clarity:
A publication resource that is located outside of the EPUB container, typically on the web. Refer to 3.6 Resource location for more information.
I am fine with that.
I noticed the definition of remote resource is:
The "typically, but not necessarily, on the web" part looks like a relic from when we didn't ban the
file://
protocol in URLs, or is there some other non-web hosting that we support? The recommendation now is to use https for anything not in the container.I think we can make the "web" part a parenthetical clarification:
Or even just say: "A publication resource that is located on the web." It kind of stands to reason that a web-based resource can't also be in the container.