markcox / theunarchiver

Automatically exported from code.google.com/p/theunarchiver
Other
0 stars 0 forks source link

FUSE client #161

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Write a FUSE client to mount archives as volumes.

Original issue reported on code.google.com by paracel...@gmail.com on 3 Jul 2009 at 2:12

GoogleCodeExporter commented 9 years ago

Original comment by paracel...@gmail.com on 20 Jul 2009 at 4:25

GoogleCodeExporter commented 9 years ago
I'm taking a look at implementing FUSE support. So far it looks feasible with 
regards to wrapping XADArchive, could probably be more heavily integrated into 
the rest of TU.

Original comment by jeremyag...@gmail.com on 13 Dec 2010 at 6:01

GoogleCodeExporter commented 9 years ago
If you're looking into it, make sure to look at XADArchiveParser, not 
XADArchive. XADArchive is a legacy API, XADArchiveParser is more flexible (and 
lower-level).

Original comment by paracel...@gmail.com on 13 Dec 2010 at 6:05

GoogleCodeExporter commented 9 years ago
The lack of documentation is painful. From my understanding, XADArchive wraps 
XADArchiveParser? I'm having trouble figuring out how XADArchiveParser handles 
hierarchies. A filesystem implementation would need to be able to represent the 
directory structure of the archive and extract single files from different 
paths. Am I correct in that the XADArchiveParser maintains a flat record of all 
links in the archive, and that directories are represented only in the 
filenames of the records?

Original comment by jeremyag...@gmail.com on 13 Dec 2010 at 6:49

GoogleCodeExporter commented 9 years ago
Yeah, all I've managed to write is a bit of a stub overview at 
http://code.google.com/p/theunarchiver/wiki/XadMasterApiDocumentation .

Anyway, yes, XADArchive wraps XADArchiveParser. XADArchive used to wrap libxad 
in older versions, but from 2.0 XADArchiveParser is where most functionality is 
implemented, and XADArchive is a wrapper.

The representation of directories is dependent on the archive format. 
XADArchiveParser passes the structure of the archive file itself through as 
straight as possible. Some of them have separate records for directories, some 
don't. Some have those separate records before the files contained in them, 
some of them have them afterwards.

Also, filenames are XADPath objects, which are basically arrays of XADStrings, 
so they are already chopped into components, which might make things a little 
easier.

Original comment by paracel...@gmail.com on 13 Dec 2010 at 7:13

GoogleCodeExporter commented 9 years ago
Is there a reason that [XADArchiveParser allFilenames] only seems to return the 
path of the source archive? I'm working with a simple zip (sample provided) and 
I've tried allocating one explicitly as well as accessing it from within an 
XADUnarchiver. I've just about given up understanding the internal mechanics of 
XADUnarchiver, my attempts to trace the program flow make it look like there's 
an infinite loop between extractEntryWithDictionary and 
_updateFileAttributesAtPath. As I see it, the dictionaries are file records and 
may contain just one file or a whole hierarchy. I can't seem to figure out 
where they're coming from or a good way to access and traverse them externally.

Original comment by jeremyag...@gmail.com on 13 Dec 2010 at 9:39

GoogleCodeExporter commented 9 years ago
Because that is what it is for. It will return all source filenames (which will 
be more than one for multi-part artchives).

Here's how XADArchiveParser works:

* A XADArchiveParser object is initialized from a CSHandle, which is an 
abstract file handle. These can come from files, memory, or entries inside 
archives. Usually you do this with a convenience method that opens a file for 
you.
* You set a delegate for the XADArchiveParser.
* You call [XADArchiveParser parse].
* XADArchiveParser starts reading through the archive, and for each entry it 
finds, it builds a dictionary of information and delivers this to the delegate.
* The delegate either stores this for future use, or it uses [XADArchiveParser 
handleForDictionary:] to get a CSHandle for reading data from the file.

When parsing is done, you can use any saved dictionaries to access file data at 
a later time. See XADTest2 and XADTest3 for simple examples of how this works.

XADUnarchiver is a helper class to make it easier to actually unarchive the 
contents into actual files.

Original comment by paracel...@gmail.com on 13 Dec 2010 at 9:54

GoogleCodeExporter commented 9 years ago
Ah I understand now. Since handleForDictionary and related methods were labeled 
for internal use I was unsure if they were to be used normally. I guess I was 
looking at XADArchiveParser as more of a container (a la XADArchive) while it's 
really just a controller. Thanks.

Original comment by jeremyag...@gmail.com on 13 Dec 2010 at 11:05

GoogleCodeExporter commented 9 years ago
Hey Dag. I don't know if you knew, but there is a XADMaster-based 
MacFUSE/OSXFUSE program called TranspRAR.

http://forums.plexapp.com/index.php/topic/17211-transprar-rar-workaround-for-ple
x-9/
https://github.com/alleus/TranspRAR

I just uploaded my own version, where I rewrote most of the interface with 
XADMaster.

https://github.com/btrask/TranspRAR

It works pretty well but I was having problems with a NULL de-reference with 
RAR30s so I added some simple hacks to work around that. Now it doesn't seem to 
crash but it's giving a lot of garbage data, both for RAR30s (as expected given 
my hacks) and other RARs/ZIPs. I think these problems stem from the way I'm 
using CSHandles, but I'm not sure if it's "my fault" or if there are some real 
problems.

It seems to happen at random, but it's very frequent when trying to open lots 
of files at once in random order. TranspRAR is single-threaded, but maybe 
there's a re-entrancy issue with reading from many different handles from the 
same archive out of order? I'm not sure.

I've tested with the latest XADMaster source from hg.

If you could look into this, I'd really appreciate it. If you need sample files 
that exhibit the problems, I can provide them.

Thanks,
Ben

Original comment by bentr...@comcast.net on 27 Jul 2012 at 7:06

GoogleCodeExporter commented 9 years ago
XADMaster uses lots of "nonCopiedSubHandle" calls internally, because it can be 
extremely expensive to make actual copies of some kinds of CSHandles. This 
means that a read from a sub-handle will move the file pointer in its parent 
handle, and that will confuse all other sub-handles.

In practice, this means that you can only use one handle at a time. This might 
be a pain for a FUSE client, but the only way around it would be to properly 
implement copy on all possible handles (probably not hard, just a bit tedious) 
and then implement some kind of option to make XADMaster actually use real 
copies. However, that still leaves you with the problem that copies are 
expensive for some handles: For instance, a solid 7z archive with a huge LZMA 
window could possibly incur a cost of a even a hundred megabytes per opened 
file.

Original comment by paracel...@gmail.com on 27 Jul 2012 at 7:15

GoogleCodeExporter commented 9 years ago
What do you think would be the best solution/workaround? Would it be reasonable 
to just create a second parser whenever it needs to open a file? Can entry 
dictionaries be shared between parsers for the same underlying archive without 
re-parsing?

Original comment by bentr...@comcast.net on 27 Jul 2012 at 7:42

GoogleCodeExporter commented 9 years ago
Sharing dictionaries ALMOST works. It should work for most formats but I bet 
there are at least a few which do some extra internal bookkeeping.

One thing which might be worth trying is to keep books on where each handle 
expects the archive handle to be. [parser handle] gives you the handle, and you 
can check its position after reading, and restore it to that position before 
the next read, if another file has been read in the meanwhile. That might work, 
although it can be expensive for files like .tar.gz where seeks are expensive.

Original comment by paracel...@gmail.com on 27 Jul 2012 at 8:16

GoogleCodeExporter commented 9 years ago
I've got it creating a new sub-handle for every read now. It seems rock solid 
and I don't think there's too much performance impact (correct me if I'm wrong).

Thanks for your help and excellent software.

Original comment by bentr...@comcast.net on 27 Jul 2012 at 9:22

GoogleCodeExporter commented 9 years ago
Well, every time you create a new handle, you have to start unpacking from the 
start of the file again. If you do something like interleaved reading from two 
big files, it will start slowing down more and more. You'll have to try and 
guess how common that case is and if it is worth optimizing for.

(Make sure to keep the same handle open for subsequent reads from the same 
file, though, as long as no reads from other files happen in the meanwhile.)

Original comment by paracel...@gmail.com on 27 Jul 2012 at 9:52

GoogleCodeExporter commented 9 years ago
If you want writable filesystems, Issue 687 must come before this.

Original comment by alexchan...@gmail.com on 21 Jul 2013 at 10:41