sereko / theunarchiver

Automatically exported from code.google.com/p/theunarchiver
Other
0 stars 0 forks source link

Inconsistent use of XADIsDirectory key in entry dictionaries #199

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Most "normal" ZIP (and JAR) files contain en entry for each directory, not just 
for files. Some ZIP 
files don't contain entries for intermediate directories at all (though I'm 
able to infer them in my 
own code). More problematic are ZIP files that contain directory entries, but 
no POSIX 
permissions for its entries.

First, it's mildly annoying that an entry dictionary for a file resource 
doesn't contain the key 
"XADIsDirectory" at all. (To my view, they should, with the value being an 
NSNumber with a 
boolean value of NO.) Only entries for directories currently have the key. This 
requires a little 
extra processing, and would be simpler to do when creating the entries.

Second, if a directory entry doesn't have a valid permissions value, it also 
doesn't have a value 
for the "XADIsDirectory" key. I can understand that whether or not it's a 
directory may be 
determined from the POSIX permissions, so this could be a non-trivial problem 
to solve. 
However, scanning common paths to infer what is a directory (its path is a 
prefix to another 
entry's path) would be nice to have.

This last part seems like something that might be a good fit for a Trie data 
structure 
(http://en.wikipedia.org/wiki/Trie). Perhaps an adaptation of something like 
OFTrie from 
http://www.omnigroup.com/developer/ could help? (I realize that's a big chunk 
to bite off, so 
this may not be possible at the moment.)

I only use ZIP files, so I'm not sure how prevalent this behavior is across the 
board. Thanks for 
writing the framework!

Original issue reported on code.google.com by quinntay...@mac.com on 23 Sep 2009 at 10:25

GoogleCodeExporter commented 9 years ago
I'm attaching 2 JAR files that exhibit this behavior and may help isolate the 
issue.

In coherence-hibernate.jar, not all intermediate directories have corresponding 
entries in the ZIP file.

In jboss-ejb3x.jar, all intermediate directory entries exist, but no valid 
permissions are present.

Original comment by quinntay...@mac.com on 23 Sep 2009 at 10:41

Attachments:

GoogleCodeExporter commented 9 years ago
For the first part, it is would be more inconsistent to have XADIsDirectory = 
NO for files, 
because they they'd also have to have XADIsLink = NO, XADIsResource = NO, 
XADIsEncryptedKey = NO, XADIsCharacterDeviceKey = NO, &c, &c. Not happening.

For the second, extraction would not work at all if directories didn't have 
XADIsDirectory set correctly, so the Zip parser does use several different 
tricks to figure 
out if an entry is a directory. Running XADTest2 or XADTest3 seems to correctly 
show 
directories as directories, so I don't see any problems there.

Original comment by paracel...@gmail.com on 24 Sep 2009 at 1:11

GoogleCodeExporter commented 9 years ago
I wasn't aware that the boolean value for XADIsDirectory was irrelevant (that 
the presence/absence of the key 
was sufficient) or of the existence of the other keys named. It makes sense to 
keep the dictionaries simple, so 
I'll concede that point. Since I know I can count on files never having the 
key, it does simplify things as a 
client.

For the second part, I'll own up that my misunderstanding of  how 
XADIsDirectory is used (and parsing of 
POSIX permissions myself) caused the confusion. I removed the code that 
modified that key and it works now. 
</sheepish grin>

That said, the Coherence JAR does present an interesting case in which not all 
intermediate directories are 
present. (It was generated by Ant 1.6.2, for what it's worth.) Like I said, so 
far I can work around this issue, so 
it's not a big deal. I haven't yet tested whether extracting a directory for 
which an entry doesn't exist works. 
I'll comment on this issue when I get that working.

Original comment by quinntay...@mac.com on 24 Sep 2009 at 3:53

GoogleCodeExporter commented 9 years ago
Well, technically I haven't specified whether or not a dictionary can include 
booleans 
with the value NO. I think none of the parsers do this, but I am not entirely 
sure. It 
might be a good idea to make that a hard rule to simplify reading, but 
currently it is 
not guaranteed.

Also, including directories in the results is optional. Some archive formats 
include 
them, others do not (and zip does either one, at random), so you should not 
expect 
them to show up, or show up in the correct order. Even if they were included 
there 
are other issues with directories, such that you need to special-case them 
anyway. 
For instance, if you try to set the last-modified date of a directory and then 
unpack 
files to it, that date will be changed, so you have to set that after you're 
done 
unpacking files, even though the directory's data usually arrives first.

Original comment by paracel...@gmail.com on 24 Sep 2009 at 10:54