w3c / pwpub

W3C packaged Web Publications
https://w3c.github.io/pwpub/
Other
15 stars 9 forks source link

Should the packaging spec explicitly disallow file name characters? #35

Closed llemeurfr closed 5 years ago

llemeurfr commented 5 years ago

The OCF specification lists a series of characters which cannot be used in file paths and file names.

The ISO 21320 specification, on which the WP packaging format will be based, does not formally prohibits any character for use in file paths / names. Instead, in an informative annex, it expresses that [The ZIP] "Appnote specifies few restrictions for filenames in the archive. For compatibility, this part of ISO/IEC 21320 does not require additional restrictions on filenames which are valid according to Appnote." Then it shows as "knows restrictions" in a comparative table what JAR, Widget Zip, OOXML, OCF and Adobe UCF do prohibit.

The ZIP Appnote has a field reserved for "file name" (4.4.17). It states that "The path stored MUST NOT contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc. ". I also found Appendix D.2, which states that "the ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437". But there is also a way to storie a Unicode Path in UTF-8 in either an "extra field" (4.6.9) or the original field.

Conclusion: in this Appnote there is no mention of disallowed characters in file paths / names.

Therefore, we can't rely on ISO 21320 to limit the characters allowed in file paths / names. Either we keep (extend?) the OCF constraints, or we rely on authors (and operating systems) to avoid file path / names which would break interoperability between systems. After all this is what we do when choosing a file name; I can name a file [{"'!§$€%.x on MacOS: would I try to move it to a Windows or Linux machine?

iherman commented 5 years ago

I am sympathetic to pragmatism. It is good to be precise in the spec, but we have to be careful not to cast in concrete features that would evolve around us. We may impose restrictions that would not be relevant in a few years (so people will ignore them, in fact).

That being said: having some sort of informative reference and guidance would be good. We could/should refer to the OCF document informally if that serves the purpose. We could also consider referring to the URL specification in terms of path names, too (e.g., the WhatWG URL spec or IRI spec). Indeed, if a packaged WP has to be used on the Web, then the usage of the file names as part of URL-s are also something to consider.

To be honest, I am at loss myself on that matter: i.e., whether the URL-s conflict or not with the OCF spec, etc. I know that I have a liberty to use crazy filenames on a Mac, I have no idea what the situation is on recent Windows, on Chrome, on Linux, etc. I must admit (but possibly because I have gray hairs) that when I create files that are supposed to be used as part of URLs, for example, I still restrict myself to ASCII...

llemeurfr commented 5 years ago

The LPF draft contains the following note: The [ZIP] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors must be careful to use characters which allow a broad interoperability among operating systems and are compatible with relative URLs.

llemeurfr commented 5 years ago

agreed to replace "must" be "should" (non normative).

iherman commented 5 years ago

This issue was discussed in a meeting.