Closed ghost closed 4 years ago
The idea has popped up many times, and every time it is dropped after some deeper analysis.
It would be very hard to make such a non-volatile cache to work flawlessly and also the increased logic and design complexity it would require does not motivate such a huge undertaking.
There are other things you can do to speed up the process significantly after a mount.
Use a low value (as low as possible without loosing data) for --seek-length
and use the new warmup
mount option. The cache warm-up will basically start automatically in the background at mount time and populates the entire directory cache. Note that there are not just one cache but two implemented by rar2fs. One for directories lists and one for actual file properties. Those two variants serve very different purposes but they complement each other.
RocksDB would probably be useful to implement this.
@karibertils thanks for the pointer, but as things stand right now there will be no effort made to save data/cache in some persistent storage. Hence it is not the storage as such that is the problem, it is reading it back that is error prone and rather non-deterministic.
@braderhart Unless there is something more that you wish to add I would like to close this issue.
@braderhart The warmup
mount option was added in v1.29.0 so it is already available.
AFAIK the use-case you are referring to also in the majority of the cases implies archive volumes for which there is only a single file. In that case use the --seek-length=1
option to speed up meta-data lookup.
But there is still something that I fail to understand here. Extraction speed has nothing to do with the cache really. Extraction speed itself is not that much worse than using a native file on a local file system, with the exception of the possible overhead caused by FUSE itself or if the file is compressed in which you need to face the overhead of unpacking. But for streaming data you would probably not even notice the difference? A recursive mount point lookup (e.g. ls -R
) might be a bit painful for huge collections but if you know what path to search it would not cause much overhead, if any.
That sounds really bad? Can you tell if the size of the individual RAR archive volume affects the time it takes? You could create a volumed test archive with very small sets.
As I have stated before, only headers are read in order to access file names etc. inside an archive. For volumes we need to in fact always read two volumes due to technical reasons. But the amount of data needed from each volume file should be in the size of bytes, not even kilobytes.
Could it be so that rsync
does not handle this very well and tries to download the entire file first before rar2fs gets a chance to open it? Then things would become depending entirely on your network speed, I would bet you that if you put that archive on some local files system and mounted it using rar2fs it would not become close to even a second to populate the directory entry when using --seek-length=1
.
Any updates here?
Closing this due to inactivity.
1: No code table for op: ++post