Closed philmitchell closed 3 years ago
Hello,
This is an intriguing issue. What do you mean when you say "sporadic crashes"? Some files cause them and some files don't, or that one single file sometimes causes a crash and other times doesn't?
Before discussing potential solutions, it would be interesting to understand the root cause. I can foresee one of three options:
I don't know how this is possible, as the tarfiles are under my control and all have been tested to work.
This sentence makes me think that this is either the 2nd or 3rd option, but to be absolutely sure, I would need to investigate this problem further. For this I would need additional information, like an example of a file that causes the crash and/or the name of the program that produced it.
Though it seems unlikely, it may also be the case that there is an issue within Swift itself, which surfaces only on specific platforms and/or specific versions of Swift. There was a precedent before, see this code.
P.S. I guess, your suggested fix would be useful in any case, but I still think that I should better understand what is happening.
Hi @tsolomko, thanks for the quick reply!
Unfortunately, the crash report doesn't indicate which file(s) were involved. However, this is for an app that has only seven tarfiles that it uses.
These files were created using the stock bsdtar
that comes with macOS:
bsdtar 3.3.2 - libarchive 3.3.2 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.6
I'd be happy to send you one of these files privately. Please contact me at phil at larkwire dot com.
I've sent you an email.
Also, am I correct to assume that you use Swift 5.4 on one of the darwin platforms (like ios)?
Yes, these crashes are occurring on iOS, however they are on Swift 5.3.2.
Hmm, I've tested with your files both on macOS and iOS, both with Swift 5.4 and Swift 5.3.2, and I haven't encountered any crashes.
I am going to add a check for indices anyway (like this), but for now I cannot reproduce the issue.
Well, I'm not surprised ... it happens only sporadically!
Your fix looks good, except maybe >=
test for the start index (and possibly <=
for the end index)?
The start index can't be equal to data.startIndex
since by that point in code at least one tar header (which size is equal to 512 bytes) has been processed.
The end index can't be equal to data.endIndex
since a TAR file must end with 1024 bytes of zeros, which aren't included into any entry. These zero bytes are actually crucial to determine the end of a TAR file, and their presence is checked separately every time before trying to parse an entry.
(After writing these lines I've got a feeling that this new check is redundant...)
That makes sense. Could entryInfo.size
be wrong ... sporadically?!
The size property, as every other numeric field of TAR, is non-trivially determined depending on the format extensions used, but if it was wrong, it would be a deterministic problem. I also just in case compared the sizes determined by SWCompression with the output of tar -tvf
and they match exactly.
Looking at the crash stack trace again, assuming that it represents a real-life scenario and guessing from the functions' names, could it be possible that TAR files just aren't fully downloaded, before being processed by SWCompression?
Good question! The download occurs via system calls that call back when the download is finished.
If the network is bad and the file doesn't download completely, wouldn't the initial parse fail (no closing bytes)?
But let's say that the parse succeeds, despite the fact that not all of the file contents are there. It seems that testing against data.endIndex
should catch this?
What I was trying to imply in my previous message is that no matter what SWCompression does (crashes or throws an error), it still won't provide you the results, so it may or may not be a good idea to check that the data is finished downloading before invoking SWCompression.
If the network is bad and the file doesn't download completely, wouldn't the initial parse fail (no closing bytes)?
This is a good question. Apparently, no: the currently implemented check actually conflates two situations: when the EOF marker is reached and when a file is truncated.
But let's say that the parse succeeds, despite the fact that not all of the file contents are there. It seems that testing against
data.endIndex
should catch this?
Yes, it will catch it.
Anyhow, I am going to release an update with this fix some time soon (this week?), but the logic in the "initial parse" is definitely incorrect and I should reconsider it in the future.
That sounds good. Thanks so much for addressing this so promptly. Really much appreciated!
First, thanks for the excellent library!
I am seeing a sporadic but persistent crash (in production) in
TarContainer.open()
on the following line:let entryData = data.subdata(in: dataStartIndex..<dataEndIndex)
It seems that the range supplied to
data.subdata()
is out-of-range.I don't know how this is possible, as the tarfiles are under my control and all have been tested to work. Crash stack trace is below.
Would you accept a PR that tested
dataStartIndex
anddataEndIndex
againstdata.startIndex
anddata.endIndex
and, if out of range, either: