tsolomko / SWCompression

A Swift framework for working with compression, archives and containers.
MIT License
237 stars 41 forks source link

Sporadic crash in TarContainer.open() #23

Closed philmitchell closed 3 years ago

philmitchell commented 3 years ago

First, thanks for the excellent library!

I am seeing a sporadic but persistent crash (in production) in TarContainer.open() on the following line:

let entryData = data.subdata(in: dataStartIndex..<dataEndIndex)

It seems that the range supplied to data.subdata() is out-of-range.

I don't know how this is possible, as the tarfiles are under my control and all have been tested to work. Crash stack trace is below.

Would you accept a PR that tested dataStartIndex and dataEndIndex against data.startIndex and data.endIndex and, if out of range, either:

  1. Silently adjusted the range, or
  2. Threw an error?
libswiftFoundation.dylib Foundation.Data._Representation.subscript.getter : (Swift.Range<Swift.Int>) -> Foundation.Data
libswiftFoundation.dylib Foundation.Data.subdata(in: Swift.Range<Swift.Int>) -> Foundation.Data
MyApp function signature specialization <Arg[1] = Dead> of static SWCompression.TarContainer.open(container: Foundation.Data) throws -> [SWCompression.TarEntry]
MyApp static SWCompression.TarContainer.open(container: Foundation.Data) throws -> [SWCompression.TarEntry]
MyApp MyApp.FileHelper.(_installContent in _0551FF1AC8624441A93DA7C5AC53D8BC)(fromDownloadLocation: Foundation.URL, filename: Swift.String, progress: __C.NSProgress?) -> () ArchiveManager.swift:16
MyApp partial apply forwarder for closure #1 () -> () in MyApp.FileHelper.installContent(fromDownloadLocation: Foundation.URL, filename: Swift.String, progress: __C.NSProgress?) -> () FileHelper.swift:107
MyApp reabstraction thunk helper from @escaping @callee_guaranteed () -> () to @escaping @callee_unowned @convention(block) () -> () <compiler-generated>:0
libdispatch.dylib
_dispatch_call_block_and_release
tsolomko commented 3 years ago

Hello,

This is an intriguing issue. What do you mean when you say "sporadic crashes"? Some files cause them and some files don't, or that one single file sometimes causes a crash and other times doesn't?

Before discussing potential solutions, it would be interesting to understand the root cause. I can foresee one of three options:

  1. The TAR file(s), that cause(s) this crash, is (are) malformed. In this case, your suggested solution would be appropriate.
  2. The TAR file is well-formed. This would mean that there is a problem with the implementation in SWCompression, and in that case it should be fixed instead.
  3. The TAR file doesn't technically conform to any of the several TAR formats implemented in SWCompression, however such files are consistently produced by another implementation (for example, due to a bug in this other implementation). In this case, a workaround should be added to account for such "non-standard" files.

I don't know how this is possible, as the tarfiles are under my control and all have been tested to work.

This sentence makes me think that this is either the 2nd or 3rd option, but to be absolutely sure, I would need to investigate this problem further. For this I would need additional information, like an example of a file that causes the crash and/or the name of the program that produced it.

Though it seems unlikely, it may also be the case that there is an issue within Swift itself, which surfaces only on specific platforms and/or specific versions of Swift. There was a precedent before, see this code.

P.S. I guess, your suggested fix would be useful in any case, but I still think that I should better understand what is happening.

philmitchell commented 3 years ago

Hi @tsolomko, thanks for the quick reply!

Unfortunately, the crash report doesn't indicate which file(s) were involved. However, this is for an app that has only seven tarfiles that it uses.

These files were created using the stock bsdtar that comes with macOS: bsdtar 3.3.2 - libarchive 3.3.2 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.6

I'd be happy to send you one of these files privately. Please contact me at phil at larkwire dot com.

tsolomko commented 3 years ago

I've sent you an email.

Also, am I correct to assume that you use Swift 5.4 on one of the darwin platforms (like ios)?

philmitchell commented 3 years ago

Yes, these crashes are occurring on iOS, however they are on Swift 5.3.2.

tsolomko commented 3 years ago

Hmm, I've tested with your files both on macOS and iOS, both with Swift 5.4 and Swift 5.3.2, and I haven't encountered any crashes.

I am going to add a check for indices anyway (like this), but for now I cannot reproduce the issue.

philmitchell commented 3 years ago

Well, I'm not surprised ... it happens only sporadically!

Your fix looks good, except maybe >= test for the start index (and possibly <= for the end index)?

tsolomko commented 3 years ago

The start index can't be equal to data.startIndex since by that point in code at least one tar header (which size is equal to 512 bytes) has been processed.

The end index can't be equal to data.endIndex since a TAR file must end with 1024 bytes of zeros, which aren't included into any entry. These zero bytes are actually crucial to determine the end of a TAR file, and their presence is checked separately every time before trying to parse an entry.

(After writing these lines I've got a feeling that this new check is redundant...)

philmitchell commented 3 years ago

That makes sense. Could entryInfo.size be wrong ... sporadically?!

tsolomko commented 3 years ago

The size property, as every other numeric field of TAR, is non-trivially determined depending on the format extensions used, but if it was wrong, it would be a deterministic problem. I also just in case compared the sizes determined by SWCompression with the output of tar -tvf and they match exactly.

Looking at the crash stack trace again, assuming that it represents a real-life scenario and guessing from the functions' names, could it be possible that TAR files just aren't fully downloaded, before being processed by SWCompression?

philmitchell commented 3 years ago

Good question! The download occurs via system calls that call back when the download is finished.

If the network is bad and the file doesn't download completely, wouldn't the initial parse fail (no closing bytes)?

philmitchell commented 3 years ago

But let's say that the parse succeeds, despite the fact that not all of the file contents are there. It seems that testing against data.endIndex should catch this?

tsolomko commented 3 years ago

What I was trying to imply in my previous message is that no matter what SWCompression does (crashes or throws an error), it still won't provide you the results, so it may or may not be a good idea to check that the data is finished downloading before invoking SWCompression.

If the network is bad and the file doesn't download completely, wouldn't the initial parse fail (no closing bytes)?

This is a good question. Apparently, no: the currently implemented check actually conflates two situations: when the EOF marker is reached and when a file is truncated.

But let's say that the parse succeeds, despite the fact that not all of the file contents are there. It seems that testing against data.endIndex should catch this?

Yes, it will catch it.

Anyhow, I am going to release an update with this fix some time soon (this week?), but the logic in the "initial parse" is definitely incorrect and I should reconsider it in the future.

philmitchell commented 3 years ago

That sounds good. Thanks so much for addressing this so promptly. Really much appreciated!

tsolomko commented 3 years ago

4.5.11 has been released.