xavierleroy / camlzip

Reading and writing zip and gzip files from OCaml
Other
41 stars 30 forks source link

Document/enforce forward slashes in entry names #40

Closed alainfrisch closed 1 month ago

alainfrisch commented 1 year ago

The spec for the Zip format (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) says:

4.4.17.1 The name of the file, with optional relative path. The path stored MUST NOT contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc. If input came from standard input, there is no file name field.

We just observed that using backwards slashes can effectively cause issues when unzipping on Linux (for Amiga, we couldn't check, unfortunately) with some tools. Such problem has also been reported e.g. here

We could argue that it's the responsibility of Camlzip users to know about this constraint, but it seems harmless and useful to add a note to mention it in the docstrings of functions taking an entry path argument.

Going a step further, Camlzip could replace \ with / automatically in entry paths (and also fail on leading slashes or drive?).

@xavierleroy : I'm happy to propose a PR implementing either of these variants if you tell me which one you prefer.

For the interested reader, here is some extra context. Win32 enforces by default a restriction on path lengths to about 256 characters. This can be lifted with some global settings, which cannot be expected on a typical Windows machine. The practical workaround is rather to prepend \\?\ in front of the path, which lifts the restriction; but if we do that, we have to use backslashes in the path -- forwards slashes are normally allowed as well, but not when that prefix is used. This means that Windows applications using Camlzip and supporting long paths will need to juggle between backslashes (for opening the file manually, or passing an input file name to copy_file_to_entry) and forward slashes (for the entry path name).

xavierleroy commented 1 month ago

Sorry for leaving this issue open for so long. I agree Camlzip should do something to ensure that stored file names don't use backslashes. See my proposal at #48 and let me know what you think.