weichsel / ZIPFoundation

Effortless ZIP Handling in Swift
MIT License
2.31k stars 255 forks source link

Extracting JSON file from archive returns jumbled data #305

Closed Lancelotbronner closed 8 months ago

Lancelotbronner commented 8 months ago

Summary

I zipped via finder a JSON database (selected a bunch of folders and right-click > Compress). I then extract those when needed, most of the files are just fine but some of them come back as garbage.

Some files are also just missing seemingly random pieces of text.

Steps to Reproduce

  1. Archive 2 directories containing JSON files using Finder (via selecting both directories, right-click > Compress)
  2. Extract the file from the archive using the following snippet:
let archive = Archive(url: url, accessMode: .read)
let entry = archive["example.json"]
var data: Data!
_ = try archive.extract(entry, skipCRC32: true) {
    data = $0
}

print(String(data: data, encoding: .utf8))
  1. This prints nil as it fails to recognize UTF-8

Expected Results

{
    "count": 13,
    "next": null,
    "previous": null,
    "results": [
        {
            "name": "ja-Hrkt",
            "url": "/api/v2/language/1/"
        },
        {
            "name": "roomaji",
            "url": "/api/v2/language/2/"
        },
        {
            "name": "ko",
            "url": "/api/v2/language/3/"
        },
        {
            "name": "zh-Hant",
            "url": "/api/v2/language/4/"
        },
        {
            "name": "fr",
            "url": "/api/v2/language/5/"
        },
        {
            "name": "de",
            "url": "/api/v2/language/6/"
        },
        {
            "name": "es",
            "url": "/api/v2/language/7/"
        },
        {
            "name": "it",
            "url": "/api/v2/language/8/"
        },
        {
            "name": "en",
            "url": "/api/v2/language/9/"
        },
        {
            "name": "cs",
            "url": "/api/v2/language/10/"
        },
        {
            "name": "ja",
            "url": "/api/v2/language/11/"
        },
        {
            "name": "zh-Hans",
            "url": "/api/v2/language/12/"
        },
        {
            "name": "pt-BR",
            "url": "/api/v2/language/13/"
        }
    ]
}

Actual Results

ï0Nfl‰Ä]Áõú‚P†îÆ∏¿˛õ‚Sfl‰Ä]Áõüîàǧ∏–%ú‚‡0ú‚`Y§∏üî`Y§∏@B“‰Ä]Áõ–Ñ:‰`Y§∏®2ú‚`Y§∏»3ú‚`Y§∏P&ú‚»ß¢∏x0ú‚–÷”‰‰∏°∏®6ú‚(ÿ”‰‰∏°∏»?ú‚÷”‰‰∏°∏®Bú‚®◊”‰‰∏°∏ÿBú‚XÆŸ‰4nÁõ–
ú‚êÿ”‰‰∏°∏@oú‚pÿ”‰‰∏°∏poú‚Pÿ”‰‰∏°∏†oú‚Äpú‚‰∏°∏vú‚Hqú‚‰∏°∏Pvú‚rú‚‰∏°∏òvú‚ µº‚Ä]Áõ∞1ú‚Ä]ÁõP|ú‚hNÁõò~ú‚¿%ú‚Ä]ÁõHÄú‚àãú‚‰∏°∏`èú‚Ä¥º‚Ä]ÁõÿÄú‚Ä]Áõ ifl‰Ä]Áõ∏¡ú‚ÿ|fl‰Ä]Áõ0Nœ‰Ä]Áõ@Àú‚:‰ÿ<ÁõpÀú‚@`ú‚Ä]ÁõŒú‚hNÁõ¿Œú‚®˙º‚Ä]Áõ*Ω‚Ä]Áõ∞±€‰Ä]Áõp“ú‚‘ú‚Ä]Áõà‘ú‚Ä]Áõ»®fl‰Ä]ÁõH©fl‰Ä]ÁõX’ú‚Ä]Áõ¿’ú‚Ä]Áõ(÷ú‚Ä]Áõ∏5ú‚Ä]Áõ(¬fl‰Ä]Áõx6Ω‚Ä]Áõ‡0ú‚Ä]Áõp÷ú‚h3ú‚Ä]Áõ»3ú‚Ä]Áõ»B’‰Ä]Áõ

Regression & Version

ZipFoundation v0.9.17 macOS Sonoma 14.3 Beta (23D5033f) Xcode 15.1 targeting macOS

Related Link

Lancelotbronner commented 8 months ago

I'm aware I'm supposed to handle the case where more than one chunks are sent. Currently I'm trying with small files which fit into a single chunk.

I would also expect to receive valid UTF-8 (if not for maybe the last few bytes) even if chunks are missing, so I'm guessing something happened to those entries.

Lancelotbronner commented 8 months ago

I've tried with archives using a single top-level directory and that didn't change anything.

I've also noticed that every run seems to corrupt different files, so that makes me inclined to think the issue lies in the reading or decompressing.

weichsel commented 8 months ago

let archive = Archive(url: url, accessMode: .read) let entry = archive["example.json"] var data: Data! _ = try database.extract(entry, skipCRC32: true) {

In this snippet you are retrieving an entry from archive, but then calling extract on a different Archive instance called database. This doesn't seem right.

Lancelotbronner commented 8 months ago

Ah, I was renaming things on Github for the example because the names were specific to my project.

I am using the same archive, also tried in a small project separately and got the same result.

weichsel commented 8 months ago

Why are you setting skipCRC32: true? What happens when you drop that parameter (which equals skipCRC32: false)? Are the checksums correct?

Lancelotbronner commented 8 months ago

Yes they are (I've checked it with true and the same happens), the archive isn't downloaded from the internet it's local.

I also tested with zipping tools and they don't seem to have any issues.

I've also tried with Apple's compression library and it doesn't have any issues, though it doesn't allow extracting files individually.

weichsel commented 8 months ago

Can you share your code and the archive?