Open GoogleCodeExporter opened 9 years ago
Hello. I agree that the idea seems very tempting.
I have had similar thoughts before, having the contents of each RAR archives
cached in some way. There is already a cache for each file inside the archive,
but nothing that describes the archive itself. However, there were some reasons
why I left this idea behind. There are infact some options you should use to
speed up things already. It is the --exclude option and --seek-length. Using
these options speed up loading a folder and contents of archives (especially
large volumes) dramatically. Read the rar2fs.1 man page for details.
Since the original target for rar2fs was in fact small embedded Linux systems
with no equipped HDD, and writing a lot of information to the underlying file
system on flash did not seem like a good idea, the cache need to be memory
resident. Memory is another limiting resource in most typical embedded systems.
Having the cache stored on disk is not going to improve things that much. The
file still need to be opened/closed for reading and the contents need to be
parsed. Also, the md5 checksum must be calculated each time to be able to
compare with the on disk copy. It would of course avoid the need for the
--seek-length option. On the other hand, keeping a cached copy of each archives
file structure in memory should be a lot faster but at the cost of resources.
I am not saying it is a bad idea. Not at all. But there must be a balance
between speed and resource needs. The md5 approach is good in that it will
allow modifications to the archive so that cached information may be
invalidated. There is always the option to have this functionality as exactly
that, optional. So small embedded systems should just be adviced not to use it.
Remember though that todays cache makes loading of archives equally fast
(almost) over time. Having a fully cached approach will require some initial
"cost" when the cache is populated, especially if you remove the --seek-length
option.
If you are you are eager to get something going fast, please feel free to
implement this on your own and branch it out from the trunk and then we can
merge it back once you feel it is ready.
Original comment by hans.bec...@gmail.com
on 30 May 2011 at 10:52
Hey,
Thanks for the reply.
I know caching as it is works good. The thing is though that after a reboot or
power-down this cache is gone.
I myself am not very eager for this to be implemented any time soon but I
thought it might be handy. The options you mentioned did help a lot indeed.
I'm afraid I'm not a C-coder so there is nothing I can do to help building it,
sorry.
Cheers,
Joris
P.s: I've submitted a freebsd port for rar2fs (and libunrar 4.0.7) :)
Original comment by wiebel@gmail.com
on 30 May 2011 at 1:15
Thinking about it, I do not really see the md5 approach as possible. After all,
you need to compare the cached md5 checksum with something and that something
you would be forced to perform at every access.
I think it is then better to actually waste some memory (optional) and add an
additional cache for the archive file paths. Similar to what is done for files
inside the archive today. Each entry in the cache would then link to the
already extracted archive file structure. That could then be fed to the parser
instead of the file handle. Today rar2fs does not subscribe to file system
events, so modifications to RAR archives will pass unnoticed. I think that is a
reasonable limitation, and it would not change by introducing a second cache.
With this second cache (or first really), some disk/network IO will be avoided
since the file header is no longer required.
Original comment by hans.bec...@gmail.com
on 30 May 2011 at 1:18
Original comment by hans.bec...@gmail.com
on 1 Jun 2011 at 8:48
Original issue reported on code.google.com by
wiebel@gmail.com
on 30 May 2011 at 9:26