r-lib / zip

Platform independent zip compression via miniz
https://r-lib.github.io/zip/
Other
83 stars 19 forks source link

`unzip()`: Support for multibyte zip paths on MacOS and Win? #103

Open philipp-baumann opened 1 year ago

philipp-baumann commented 1 year ago

Hi @gaborcsardi , First of all thanks for this great package! I don't know if this an expected feature or a consequence of missing Latin1 support somewhere in MacOS. In brief, I want to fix a package, whereby the R CMD test in the CI fails. https://github.com/lgnbhl/BFS/pull/13 .

To explain the situation, an API call (to one the Swiss Federal Office of Statistics endpoint) downloads a zip file, which is an archive of esri shapefiles in directories that contain multibyte characters (probably generated through some windows software). This is where zip::unzip() fails, on Mac OS. utils::unzip() has exactly the same problem on MacOS, but exits a bit more gracefully. The same is with archive::archive_extract().

Now the question? Do you know an easy tool to solve this? Or is this something {zip} on MacOS could feature? One option might be do use a custom {processx} CLI workflow for MacOS, but then I need to also find a tool that does work with multibyte encoding in zip archive paths.

gaborcsardi commented 1 year ago

Can you share an example zip file? What exactly is the problem?

philipp-baumann commented 1 year ago

base_map_24025646.zip

=> edited; please see below for the correct file.

philipp-baumann commented 1 year ago

Can you share an example zip file? What exactly is the problem?

I am not a 100% sure why it cannot be unzipped on macOS, but seems to be a multibyte encoding issue (some of the paths seem to have accents in Latin1 or alike). Thanks

philipp-baumann commented 1 year ago

sorry I did post a corrupt zip before. here is the correct one base_map_24025646.zip

philipp-baumann commented 1 year ago

Just to let you know, we got a fix in our case via junkdir = TRUE, meaning that the problem was almost certainly the multibyte character paths. https://github.com/lgnbhl/BFS/issues/12#issuecomment-1723160636 .

MalditoBarbudo commented 2 months ago

Same problem here with zip files from the Spanish Forest Inventory (https://www.miteco.gob.es/content/dam/miteco/es/biodiversidad/temas/inventarios-nacionales/ifn/ifn4/ifn4_cataluna_tcm30-536603.zip). I'm trying to unzip the files but it fails in Mac. As the multibyte characters are in the file, not in the path, I'm stuck on that. Both utils::unzip and zip::unzip won't work.