google / archive-patcher

Automatically exported from code.google.com/p/archive-patcher
Apache License 2.0
534 stars 74 forks source link

Input too large #159

Closed lukefwilson closed 4 years ago

lukefwilson commented 4 years ago

Hi there, I'm getting the following error when attempting to generate a patch for a large archive (just under 2GB). I've tried to use the master and v2 branches - both fail. It looks like there is a hardcoded size limit in DivSuffixSorter.

How can I make a patch for a file this large?

Thanks for your help!

Exception in thread "main" java.lang.IllegalArgumentException: Input too large (1973952678 bytes)
        at com.google.archivepatcher.generator.bsdiff.DivSuffixSorter.suffixSort(DivSuffixSorter.java:92)
        at com.google.archivepatcher.generator.bsdiff.BsDiffPatchWriter.generatePatch(BsDiffPatchWriter.java:370)
        at com.google.archivepatcher.generator.bsdiff.BsDiffPatchWriter.generatePatch(BsDiffPatchWriter.java:336)
        at com.google.archivepatcher.generator.bsdiff.BsDiffDeltaGenerator.generateDelta(BsDiffDeltaGenerator.java:52)
        at com.google.archivepatcher.generator.PatchWriter.writeDeltaEntry(PatchWriter.java:157)
        at com.google.archivepatcher.generator.PatchWriter.writePatch(PatchWriter.java:123)
        at com.google.archivepatcher.generator.FileByFileDeltaGenerator.generateDelta(FileByFileDeltaGenerator.java:120)
        at com.google.archivepatcher.generator.DeltaGenerator.generateDelta(DeltaGenerator.java:38)
        at com.google.archivepatcher.sample.SamplePatchGenerator.main(SamplePatchGenerator.java:43)
andrewhayden commented 4 years ago

Hi there,

I can't speak for the team at this point, but I can say it would be a fairly large piece of work to add support for >2GB-size stuff. The ZIP format itself was not initially designed to handle file sizes larger than 4GB, and with parts of the code in Java we opted for a max size of 2GB (the largest value of an integer in Java). Since an individual entry in the zip file can't be bigger than the zip file itself, no effort was put into making the suffix sort code handle anything bigger, either, if I recall correctly (it's been a few years). It's a sanity limit, but it makes sense with the bigger picture of the ZIP specification that was implemented.

The ZIP specifications does have support for large archives (ZIP64) and there is a path forward there - but it's nontrivial (requires quite a lot more work) and it would need to get done before it was really useful to adjust the suffix sort algorithm that is used to generate the diffs. An alternative would be to add proper support for just files up to 4GB in size, sticking with the original ZIP... that might be less problematic than fully supporting the later ZIP specification for large archives, but again you'd have to change a fair amount of code, and if you were going to do that you may as well just go for full ZIP64 support. Here's a great summary from Wikipedia:

The original .ZIP format had a 4 GiB (232 bytes) limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 (216) entries in a ZIP archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limits to 16 EiB (2^64 bytes). In essence, it uses a "normal" central directory entry for a file, followed by an optional "zip64" directory entry, which has the larger fields.

For the original use case that Archive Patcher was designed for - reducing application patch size on Android - this just wasn't a problem in any real sense; APK files just aren't that big, as a rule. The extra complexity just wasn't justified.

lukefwilson commented 4 years ago

Thanks for the quick response @andrewhayden! We're building patches for mobile VR games, which tend to have large APKs/OBBs. We ended up finding a solution to our large file problem with https://github.com/sisong/HDiffPatch.