Open GoogleCodeExporter opened 8 years ago
Original comment by paracel...@gmail.com
on 20 Jul 2009 at 4:25
I'm taking a look at implementing FUSE support. So far it looks feasible with
regards to wrapping XADArchive, could probably be more heavily integrated into
the rest of TU.
Original comment by jeremyag...@gmail.com
on 13 Dec 2010 at 6:01
If you're looking into it, make sure to look at XADArchiveParser, not
XADArchive. XADArchive is a legacy API, XADArchiveParser is more flexible (and
lower-level).
Original comment by paracel...@gmail.com
on 13 Dec 2010 at 6:05
The lack of documentation is painful. From my understanding, XADArchive wraps
XADArchiveParser? I'm having trouble figuring out how XADArchiveParser handles
hierarchies. A filesystem implementation would need to be able to represent the
directory structure of the archive and extract single files from different
paths. Am I correct in that the XADArchiveParser maintains a flat record of all
links in the archive, and that directories are represented only in the
filenames of the records?
Original comment by jeremyag...@gmail.com
on 13 Dec 2010 at 6:49
Yeah, all I've managed to write is a bit of a stub overview at
http://code.google.com/p/theunarchiver/wiki/XadMasterApiDocumentation .
Anyway, yes, XADArchive wraps XADArchiveParser. XADArchive used to wrap libxad
in older versions, but from 2.0 XADArchiveParser is where most functionality is
implemented, and XADArchive is a wrapper.
The representation of directories is dependent on the archive format.
XADArchiveParser passes the structure of the archive file itself through as
straight as possible. Some of them have separate records for directories, some
don't. Some have those separate records before the files contained in them,
some of them have them afterwards.
Also, filenames are XADPath objects, which are basically arrays of XADStrings,
so they are already chopped into components, which might make things a little
easier.
Original comment by paracel...@gmail.com
on 13 Dec 2010 at 7:13
Is there a reason that [XADArchiveParser allFilenames] only seems to return the
path of the source archive? I'm working with a simple zip (sample provided) and
I've tried allocating one explicitly as well as accessing it from within an
XADUnarchiver. I've just about given up understanding the internal mechanics of
XADUnarchiver, my attempts to trace the program flow make it look like there's
an infinite loop between extractEntryWithDictionary and
_updateFileAttributesAtPath. As I see it, the dictionaries are file records and
may contain just one file or a whole hierarchy. I can't seem to figure out
where they're coming from or a good way to access and traverse them externally.
Original comment by jeremyag...@gmail.com
on 13 Dec 2010 at 9:39
Because that is what it is for. It will return all source filenames (which will
be more than one for multi-part artchives).
Here's how XADArchiveParser works:
* A XADArchiveParser object is initialized from a CSHandle, which is an
abstract file handle. These can come from files, memory, or entries inside
archives. Usually you do this with a convenience method that opens a file for
you.
* You set a delegate for the XADArchiveParser.
* You call [XADArchiveParser parse].
* XADArchiveParser starts reading through the archive, and for each entry it
finds, it builds a dictionary of information and delivers this to the delegate.
* The delegate either stores this for future use, or it uses [XADArchiveParser
handleForDictionary:] to get a CSHandle for reading data from the file.
When parsing is done, you can use any saved dictionaries to access file data at
a later time. See XADTest2 and XADTest3 for simple examples of how this works.
XADUnarchiver is a helper class to make it easier to actually unarchive the
contents into actual files.
Original comment by paracel...@gmail.com
on 13 Dec 2010 at 9:54
Ah I understand now. Since handleForDictionary and related methods were labeled
for internal use I was unsure if they were to be used normally. I guess I was
looking at XADArchiveParser as more of a container (a la XADArchive) while it's
really just a controller. Thanks.
Original comment by jeremyag...@gmail.com
on 13 Dec 2010 at 11:05
Hey Dag. I don't know if you knew, but there is a XADMaster-based
MacFUSE/OSXFUSE program called TranspRAR.
http://forums.plexapp.com/index.php/topic/17211-transprar-rar-workaround-for-ple
x-9/
https://github.com/alleus/TranspRAR
I just uploaded my own version, where I rewrote most of the interface with
XADMaster.
https://github.com/btrask/TranspRAR
It works pretty well but I was having problems with a NULL de-reference with
RAR30s so I added some simple hacks to work around that. Now it doesn't seem to
crash but it's giving a lot of garbage data, both for RAR30s (as expected given
my hacks) and other RARs/ZIPs. I think these problems stem from the way I'm
using CSHandles, but I'm not sure if it's "my fault" or if there are some real
problems.
It seems to happen at random, but it's very frequent when trying to open lots
of files at once in random order. TranspRAR is single-threaded, but maybe
there's a re-entrancy issue with reading from many different handles from the
same archive out of order? I'm not sure.
I've tested with the latest XADMaster source from hg.
If you could look into this, I'd really appreciate it. If you need sample files
that exhibit the problems, I can provide them.
Thanks,
Ben
Original comment by bentr...@comcast.net
on 27 Jul 2012 at 7:06
XADMaster uses lots of "nonCopiedSubHandle" calls internally, because it can be
extremely expensive to make actual copies of some kinds of CSHandles. This
means that a read from a sub-handle will move the file pointer in its parent
handle, and that will confuse all other sub-handles.
In practice, this means that you can only use one handle at a time. This might
be a pain for a FUSE client, but the only way around it would be to properly
implement copy on all possible handles (probably not hard, just a bit tedious)
and then implement some kind of option to make XADMaster actually use real
copies. However, that still leaves you with the problem that copies are
expensive for some handles: For instance, a solid 7z archive with a huge LZMA
window could possibly incur a cost of a even a hundred megabytes per opened
file.
Original comment by paracel...@gmail.com
on 27 Jul 2012 at 7:15
What do you think would be the best solution/workaround? Would it be reasonable
to just create a second parser whenever it needs to open a file? Can entry
dictionaries be shared between parsers for the same underlying archive without
re-parsing?
Original comment by bentr...@comcast.net
on 27 Jul 2012 at 7:42
Sharing dictionaries ALMOST works. It should work for most formats but I bet
there are at least a few which do some extra internal bookkeeping.
One thing which might be worth trying is to keep books on where each handle
expects the archive handle to be. [parser handle] gives you the handle, and you
can check its position after reading, and restore it to that position before
the next read, if another file has been read in the meanwhile. That might work,
although it can be expensive for files like .tar.gz where seeks are expensive.
Original comment by paracel...@gmail.com
on 27 Jul 2012 at 8:16
I've got it creating a new sub-handle for every read now. It seems rock solid
and I don't think there's too much performance impact (correct me if I'm wrong).
Thanks for your help and excellent software.
Original comment by bentr...@comcast.net
on 27 Jul 2012 at 9:22
Well, every time you create a new handle, you have to start unpacking from the
start of the file again. If you do something like interleaved reading from two
big files, it will start slowing down more and more. You'll have to try and
guess how common that case is and if it is worth optimizing for.
(Make sure to keep the same handle open for subsequent reads from the same
file, though, as long as no reads from other files happen in the meanwhile.)
Original comment by paracel...@gmail.com
on 27 Jul 2012 at 9:52
If you want writable filesystems, Issue 687 must come before this.
Original comment by alexchan...@gmail.com
on 21 Jul 2013 at 10:41
Original issue reported on code.google.com by
paracel...@gmail.com
on 3 Jul 2009 at 2:12