pixelglow / ZipZap

zip file I/O library for iOS, macOS and tvOS
BSD 2-Clause "Simplified" License
1.22k stars 199 forks source link

Opening large archives on iOS yields "Cannot allocate memory" error from NSData #72

Open logancollins opened 10 years ago

logancollins commented 10 years ago

On some devices (but not all), attempting to open a large archive (> 500 MB) results in NSData falling to initialize using NSDataReadingMappedAlways with a "Cannot allocate memory" generic Cocoa error.

Apparently this is a known regression in iOS 7 that has not been fixed in iOS 8 (see http://openradar.io/14388203).

This prevents archives larger than this size to be read for decompression. One likely solution could be to modify behavior to use NSFileHandle for accessing the file data.

I can reproduce this issue on an iPod touch (5th gen) and iPad 3, but not an iPhone 5s.

pixelglow commented 10 years ago

Thanks for your bug report!

I can't reproduce this on my own iPad 3 -- please provide the minimal code to trigger this together with iOS, device identifiers and link to an offending zip file, something similar to the following.

My configuration was as follows:

Archive:  /Volumes/Tuscany/stevejobs.zip   729954092 bytes   3 files
-rwxrwxrwx  2.1 unx 734685184 bX defN  2-Oct-13 13:00 stevejobs.avi
drwxrwxr-x  2.1 unx        0 bx stor 17-Sep-14 11:25 __MACOSX/
-rw-rw-rw-  2.1 unx     1527 bX defN  2-Oct-13 13:00 __MACOSX/._stevejobs.avi
3 files, 734686711 bytes uncompressed, 729953634 bytes compressed:  0.6%
NSError* err = nil;
ZZArchive* archive = [ZZArchive archiveWithURL:[[NSBundle mainBundle] URLForResource:@"stevejobs" withExtension:@"zip"] error:&err];

At a breakpoint after the code, err was still nil and ZZArchive contains a valid object.

logancollins commented 10 years ago

The configuration I can reproduce this most reliably with:

A 612 MB zip of 240 GIF images linked here: https://www.dropbox.com/s/9cyvqb18y5sjsdh/GIFs.zip?dl=0

• iPod touch (5th generation) • iOS 8 GM seed • zipzap 8.0 (github latest) • zipinfo GIFs.zip

Archive:  /Users/logan/Dropbox/Fail Whale/GIFs.zip   612588235   511
drwxr-xr-x  2.1 unx        0 bx stor 15-Sep-14 11:27 GIFs/
-rw-r--r--  2.1 unx   155340 bX defN 23-Jul-14 22:32 GIFs/ahhhh.gif
drwxrwxr-x  2.1 unx        0 bx stor 16-Sep-14 09:41 __MACOSX/
drwxrwxr-x  2.1 unx        0 bx stor 16-Sep-14 09:41 __MACOSX/GIFs/
-rw-r--r--  2.1 unx      317 bX defN 23-Jul-14 22:32 __MACOSX/GIFs/._ahhhh.gif
-rw-r--r--  2.1 unx  1950163 bX defN  6-Jun-13 10:12 GIFs/Ahhh… no.gif
-rw-r--r--  2.1 unx      317 bX defN  6-Jun-13 10:12 __MACOSX/GIFs/._Ahhh… no.gif
(truncated for sanity)

• Code:

// Open archive
NSError *nsError = nil;
ZZArchive *archive = [[ZZArchive alloc] initWithURL:[NSURL fileURLWithPath:sourcePath] options:@{ZZOpenOptionsCreateIfMissingKey: @NO} error:&nsError];
if ( nsError != nil )
{
    success = NO;
    *error = nsError;
}

At this point, nsError was a the following and archive was nil:

Error Domain=com.pixelglow.zipzap Code=0 "The operation couldn’t be completed. (com.pixelglow.zipzap error 0.)" UserInfo=0x17870f20 {NSUnderlyingError=0x17870970 "The operation couldn’t be completed. (Cocoa error 256.)"}
logancollins commented 10 years ago

Tracing into loadCanMiss:error: within zip zap, the underlying error is:

Error Domain=NSCocoaErrorDomain Code=256 "The operation couldn’t be completed. (Cocoa error 256.)" UserInfo=0x17870970 {NSFilePath=/private/var/mobile/Containers/Shared/AppGroup/799C88D2-63E8-490B-99DF-42AEED9682B0/Documents/GIFs.zip, NSUnderlyingError=0x165072e0 "The operation couldn’t be completed. Cannot allocate memory"}
pixelglow commented 10 years ago

Hmmm… I can't repro with any of my iOS 7 machines: iPad 3, iPad Mini Retina Display or iPhone 5. I don't yet have a machine running iOS 8.

Did you get this error with a minimal project or when you insert your code into -[AppDelegate application:didFinishLaunchingWithOptions]? It's possible that you were already running low in memory when you tried to open the archive.

If you're still getting the memory at the start, try doing a mmap with various options to see if it returns -1 i.e. failure to memory map. It may also be that NSDataReadingMappedAlways doesn't actually do mmap'ing for whatever reason. If so, I may move the code over to use mmap directly.

phoenix42 commented 10 years ago

Hi,

I'm getting the same issue, previously on iOS 7.x and now on iOS 8. Zip file is 597Mb and was created by ZipZap. Smaller files open correctly.

Error occurs on iPad mini generation 1. I'll try the mmap too. Is there any theoretical limit to mmap size on the platform? Presumably there's some big address space limit but the memory manager should be paging in and out as required.

phoenix42 commented 10 years ago

Ok, have tried the following:

const char *path = [[archivePath path]cStringUsingEncoding:[NSString defaultCStringEncoding]];

FILE *f = fopen( path, "rb" );
fseek( f, 0, SEEK_END );
int len = (int)ftell( f );
fseek( f, 0, SEEK_SET );

void *raw = mmap( 0, len, PROT_READ, MAP_SHARED, fileno( f ), 0 );

if ( raw == MAP_FAILED ) {
    printf( "MAP_FAILED. errno=%d", errno ); // Here it says 12, which is ENOMEM.
}

It fails with the error no memory. Is it possible to map only the sections required rather than the whole file?

pixelglow commented 10 years ago

I've done some tests with iOS 8.0, and have some ideas for a way forward.

First the tests, which tries to exhaust the addressable memory on the device.

int ff = open(path, O_RDONLY);
int i;
for (i = 0; i < 10000; ++i)
{
    void* addr;
    if ((addr = mmap(0, 1638400, PROT_READ, MAP_SHARED, ff, 0 ))== (void*)-1)
        break;
}
NSLog(@"%u", i);

On my devices:

On the simulator:

My analysis:

pixelglow commented 10 years ago

Absent Apple "fixing" iOS devices to allow more memory addresses, we could:

Not an ideal situation.

phoenix42 commented 10 years ago

I think you're right.

My opinion is that the library should be able to handle large files whichever device is being used. The memory map should allow that, but isn't behaving consistently between models. And it's not easy to predict when it won't work. I'm working on an App which needs to allow import/export of large files for backup purposes. My nightmare scenario is that someone could export a file which they couldn't later import!

I was until recently a Windows developer so I'm a little new to the platform, so please excuse me if this is a silly question, but can't we map a compliant window around each file as we decode/encode it and release once done? I really like the access to an NSData* for an archive entry, and I use it to encrypt on at that level for security.

pixelglow commented 10 years ago

... can't we map a compliant window around each file as we decode/encode it and release once done? I really like the access to an NSData* for an archive entry, and I use it to encrypt on at that level for security.

Not a silly question at all!

For reading, we allow random access to individual entries, so it's not easy to predict when data is no longer needed. Using a memory map keeps this efficient since the system pager should dump parts of the map under memory pressure. The library already does do some wrapping with decryption/decompressing but the bottom layer of the stack is still backed by the memory map. Quite likely, I'll have to rewire that bottom layer to read straight from the zip file instead. The central directory entries can continue to live in the memory map, since we're necessarily limiting that to at most a 64K trailer.

phoenix42 commented 10 years ago

That sounds pretty reasonable to me. It's a shame the mapping isn't more robust on the platform. It's the standard answer on windows for lots of complex stuff.

My encryption operates at the NSData level so that should be unaffected as long as you still pass that back. I have made a tiny change to expose the zip comment as I needed somewhere to put the encryption IV and salt but I can reintegrate that again!

chrisze commented 10 years ago

Glen, I can reproduce the problem on iPhone 5s and 6 Plus, when writing a new zip file with this code:

NSError *error = nil;
ZZArchive* archive = [[ZZArchive alloc] initWithURL:filePathURL options:@{ ZZOpenOptionsCreateIfMissingKey : @YES } error:&error];

NSMutableArray *entries = [[NSMutableArray alloc] init];
NSArray *fileList = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:folderPath error:nil];

for (NSString *fileName in fileList)
{
    ZZArchiveEntry *entry = [ZZArchiveEntry archiveEntryWithFileName:fileName compress:NO dataBlock:^ NSData * (NSError **error)
    {
        NSString *path = [folderPath stringByAppendingPathComponent:fileName];
        return [NSData dataWithContentsOfMappedFile:path];
    }];

    [entries addObject:entry];
    [archive updateEntries:entries error:&error];

    if (error || self.isCancelled) return;
}

Note: This code runs inside the main function of NSOperation. Because updateEntries can take a long time, we call it after each iteration so that we can stop should the operation get canceled (I assume there isn't a better way to do it, is there?).

Anyhow, this sample reproduces the problem consistently with about 100 images (~2-4 MB each).

pixelglow commented 10 years ago

@chrisze While you can update entries in a loop like that, it's pretty inefficient. In particular, every time through the loop you are compressing/storing the previous entries all over again, which adds to the slowness. You really have two choices:

Why is the 2nd choice more efficient than your code? If the array passed to updateEntries:error: has existing entries from the archive in front, it will skip writing those entries out again.

chrisze commented 10 years ago

Thanks @pixelglow. That's a good idea. It would be awesome to have a method that can take just one entry and can be efficiently called in a loop. If you are archiving large number of files, it is important to be able to cancel the operation.

phoenix42 commented 10 years ago

What's the ETA for a fix for this? Is there anything else I can do to help?

pixelglow commented 10 years ago

@phoenix42 At the moment, I'm swamped with other work but I hope to commit a fix in about 2 weeks.

Alternatively you can attempt a patch based on the outline I gave above i.e.

Either item is likely self-contained, if you're not up to doing the whole hog. Let me know if you're proceeding or waiting for me to fix it.

phoenix42 commented 10 years ago

I think I'll wait for the expert :-) The platform is too unfamiliar to me at the moment. I'll happily do some extra testing and code review though.

xaphod commented 9 years ago

Hi there, I'm getting lots of failures in my app for large ZIPs, and I traced it down to this issue. Has there been any progress on this bug? If I offer to somehow sponsor this bugfix would that speed things up? I don't think I'm clever enough to tackle it myself.... thanks for your awesome work, hope we can fix this one :)

kevinlawler commented 9 years ago

As far as Apple increasing the mmap limit, don't plan on it happening, as their reasoning must be that it allows you to sidestep the restriction on swap space. They may eventually change their mind.

For a zip program, you can fix the problem by relying on traditional reads instead of an mmap based strategy.

pixelglow commented 9 years ago

@xaphod I haven't fixed the large zip issue, mainly due to lack of time. As a contractor, I do have to prioritise paying work. I did make a start on an arbitrarily offset mmap NSData though.

Would appreciate the sponsorship! What method would you suggest?

I suggest using ChangeTip which is funded by Bitcoin. I'll set a bounty of $1,000 representing how much work I think it will take to get done. Then whoever believes this work is worthwhile tips me any amount. If the total amount tipped by everyone exceeds $1,000 or I think I can get it done for less, I'll claim all the tips and proceed with the work. Otherwise, I'll refund everyone their tips (by leaving them unclaimed).

Does that sound reasonable?

Note: the ChangeTip site says you can only tip in pull requests and commits. Don't know if it will work with an issue thread like this one. Someone can try with a tiny amount first.


Full disclosure: my main commercial driver for zipzap is Instaviz, my iOS diagram sketching app. I use zip files for the file format, with the intention of allowing embedded images. At the moment, the current version doesn't have embedded images and the DOT format compresses so well that I haven't hit the large file limit in my own work. If this work isn't funded, I will still get to it when I end up supporting embedded images in Instaviz -- maybe in 6+ months' time, depending on sales etc.? So in a sense, you're tipping here for certainty and immediacy.

xaphod commented 9 years ago

@pixelglow Thanks for taking the time to answer, and for the level of details & openness. I appreciate it :+1:

Although I like the idea of ChangeTip (I didn't check it out in detail yet, as I had an appmergency :S ), I'm afraid I think at the $1000 level it won't work. I don't have a wildly successful app yet, and the little earnings I do have are not as much as i'm spending on advertising or design assets. I was thinking more like max $100 from me... but even if all 6 people on this thread kicked in that amount, it would not be enough.

So in the meantime I have instrumented my app (WeDownload) that uses ZipZap so that in future versions I can see how many people are hitting this issue. If it's only a handful per week then that's one thing... but if it's lots of people I might have to find something else you want... :)

Thanks so much for your hard work on ZipZap, and good luck on Instaviz (looks great!)

pixelglow commented 9 years ago

OK guys, quick survey: how much would you be willing to contribute toward this fix? @xaphod looks like he would be in for $100.

xaphod commented 9 years ago

@pixelglow I was just reading your earlier post about possible solutions ("Absent Apple "fixing" iOS devices to allow more memory addresses, we could..."), and something strikes me as missing. Perhaps you ruled it out for a reason I don't see?

What about using an NSInputStream and pointing it at the file? This would require that the data of the zip file resides on disk somewhere, but that is a very reasonable requirement if the zip file doesn't fit into memory in the first place.

The advantage of a stream-based approach (to me, err, as not as super advanced senior emperor developer) would be that you use standard Apple-supported APIs without needing to customize NSData, so should be more future-proof. NSInputStreams backed by files support seeking to arbitrary locations.

pixelglow commented 9 years ago

@xaphod The October 10 post is closer to my current thinking. Incorporating your file/stream-based idea, it would be a hybrid approach:

xaphod commented 9 years ago

An instance of NSInputStream created from a path to a file on disk will be seekable by getting/setting its NSStreamFileCurrentOffsetKey property.

NSStreamFileCurrentOffsetKey Value is an NSNumber object containing the current absolute offset of the stream.

... so I think you can seek if you want to...

pixelglow commented 9 years ago

This is guaranteed by the format never to exceed 64K and memory-mapping will improve performance.

Looks like I was wrong.

The end of central directory record is max 65K. But the central directory itself can be up to 4G. Although if file names were reasonably sized e.g. 40 bytes max, it would occupy a more reasonable max of 5M only. I still think memory-mapping is the way to go here, but pathological cases may still stump the new implementation.

An instance of NSInputStream created from a path to a file on disk will be seekable by getting/setting its NSStreamFileCurrentOffsetKey property.

Indeed. I missed that. I'd still like the NSInputStream we use to be strictly bound by its offset and length in the file, to prevent client abuse though. And we'd need multiple NSInputStream referencing the same file but with independent stream positions, since we don't restrict clients to streaming in order. Ah fun.

achansonjr commented 8 years ago

I know this is a bit of an old issue, but I recently came across the same issue when trying to open a large file on an iPad4. I don't know if changes recently allowed this to work but I am using a different method of initialization to solve my problem.

` let fileData = try? NSData(contentsOfURL: url, options: .DataReadingMappedIfSafe)

let archive = try? fileData.flatMap({(try? ZZArchive(data: $0))}) `

I had a kernel memory inspector that I placed in the my loop that iterated over the entry list that would print out memory. With this, I can see that the system is safely freeing memory once it gets the memory warning from the OS.