Closed tenzap closed 7 months ago
Trying to reproduce with a recent mount-zip
1.0.13 and libzip
1.10.1.
$ mount-zip --version
mount-zip version: 1.0.13
libzip version: 1.10.1
FUSE library version: 2.9.9
fusermount3 version: 3.14.0
using FUSE kernel interface version 7.19
I downloaded the file android-ndk-r26b-linux.zip
.
$ ls -lh
total 639M
-rw-r--r-- 1 francois francois 639M Apr 18 18:32 android-ndk-r26b-linux.zip
For comparison, unzipping this ZIP with unzip
takes 27 seconds on my computer. The unzipped archive contains 526 directories and 7925 files for a total of 16 GB.
$ time unzip -d out android-ndk-r26b-linux.zip
...
real 0m26.618s
user 0m21.912s
sys 0m4.273s
$ tree -a --du -h out
...
16G used in 526 directories, 7925 files
$ rm -r out
Mounting the archive with mount-zip
only takes 0.11 seconds.
$ time mount-zip android-ndk-r26b-linux.zip mnt
real 0m0.114s
user 0m0.062s
sys 0m0.053s
$ tree -a --du -h mnt
...
16G used in 526 directories, 7925 files
Copying all the files from the mounted ZIP using cp -R
takes 16 seconds. This recursive copy effectively opens, decompresses and copies every single file from the mounted ZIP. This exercises the whole FUSE + mount-zip
+ libzip
stack. This is surprisingly faster than extracting the archive with unzip
.
$ time cp -R mnt out
real 0m16.286s
user 0m0.219s
sys 0m3.473s
$ tree -a --du -h out
...
16G used in 526 directories, 7925 files
$ rm -r out
$ umount mnt
So, I don't know why you observed some very slow access times. One of the hypotheses is that your access pattern repetitively decompresses a big file in the archive. If this is the case, it might be beneficial to use the --precache
option with mount-zip
. This preemptively decompresses every single file at mount time, and that takes about 16 seconds.
$ time mount-zip --precache android-ndk-r26b-linux.zip mnt
...
real 0m15.889s
user 0m11.559s
sys 0m2.928s
After that, copying all the files from the mounted ZIP only takes 10 seconds. And every access pattern should be equally fast, since there is no decompression involved in the process anymore.
$ time cp -R mnt out
real 0m10.142s
user 0m0.196s
sys 0m3.743s
$ tree -a --du -h out
...
16G used in 526 directories, 7925 files
You use an "old" version of mount-zip
(1.0.7), which does not feature the --precache
option. This option was added in version 1.0.8.
I'm going to close this bug as "resolved", since I guess that the --precache
option might solve your issue. Please reopen if you're still seeing slow access patterns with a recent version of mount-zip
and while using the --precache
option.
I can't use a more recent version, because libzip is not recent enough in Debian. (see also #20)
What does precache do? Does it store the data somewhere and as a consequence take some disk space? The reason I don't uncompress the zip is that there isn't enough disk space to do so, so if precache does that, I can't use it.
My access pattern is building a large application where the toolchain (clang, cmake, binutils, sysroot, libs, includes...) is accessed through fuse-zip/mount-zip
I just tried with master branch (ie. 1.0.13 with support for older libzip) + libzip 1.7.3 and without --precache it is still much slower that fuse-zip.
Since fuse-zip & mount-zip both use libzip 1.7.3, slowness seems to come from mount-zip, and not from the use of an old version of libzip.
precache
is not an alternative because I don't want to consume disk space and it looks like precache uncompresses the zip to disk.
See the discussion on bug #20.
It seems that mount-zip
fails to create a cache file with the O_TMPFILE
flag in the tmp
dir when the underlying filesystem is overlayfs
. This impedes the caching mechanism and results in poor performance when faced with non-sequential access to the contained files. See also the documentation.
This also explains why the other ZIP mounter fuse-zip
does not exhibit this performance degradation, since it caches all the uncompressed data in memory.
I can think of several solutions or workarounds:
tmp
dir, such as ext2
, ext3
, ext4
or tmpfs
.mount-zip
to avoid using O_TMPFILE
.mount-zip
to use an anonymous in-memory file created by memfd_create
, possibly as a backup solution if the cache file cannot be created in the tmp
dir.Performance is now satisfactory with latest master in my setup, whether I use --precache
or not. Thank you.
I also updated the Launchpad's PPA and it includes all your recent changes.
Can you please check again with the latest changes at commit 87c5d16f3eade4ee8ec3e381aefb029e697d684c?
It is not that easy to check every little change. :/
However, a fast check on my system where it used to be slow, looks fine. (without --precache)
And with precache:
mount-zip: The filesystem of '/tmp' does not support O_TMPFILE
mount-zip: Created cache file '/tmp/AWWJ3o'
precache could be slower than without it, not sure though.
How is one supposed to invoke mount-zip to have "Using memory cache"? without --cache
, or with --cache=
? Maybe this should be documented somewhere.
Thanks for the verification. It looks good.
How is one supposed to invoke mount-zip to have "Using memory cache"? without
--cache
, or with--cache=
?
Yes, that's right. You can use --cache=
at the moment in order to experiment with the memory cache.
However, this feels a bit like an undocumented hack. I'm thinking about adding a separate and properly documented command-line option for that. Maybe something like --memcache
.
I added the --memcache
option. Feel free to experiment with it.
I was trying to replace fuse-zip by mount-zip in my gitlab-ci job because it would have improved performance. However I noticed it is extremely slow, which makes it useless for now in my scenario.
EDIT: I have the same problem locally and it consumes 100% CPU. It is not stuck of frozen, it just progresses very slowly.
fuse-zip -r -o allow_other "$zipfile" "$mount_dir"
mount-zip -o allow_other "$zipfile" "$mount_dir"
The zipfile is Android's NDK r26b for linux.
To gain space on the gitlab runner I don't unzip it and mount the zip directly.
It is then used to build Qt for Android (ie. it accesses a lot the files in the zip which contains cmake, the compiler (clang), header files, libs...). After configuration stage, approx 10k files are built with ninja.
With fuse-zip, complete "configure+build+install cycle" takes approx 75min for a complete cycle (given that there is ccache also which speeds up things) When using mount-zip, it didn't even pass the configuration step after 150min. I stopped the job completely at that time and reverted back to fuse-zip.
FYI, I was once using
lsof
to check if a file is open in that gitlab job (running Debian) and had to change tofuser
because lsof was extremely slow compared to it fuser. A single call to lsof took 2-3sec (with network/DNS search disabled) while it was almost instant forfuser
. I mention this in case mount-zip uses something similar to what lsof does. However locally I have the same problem.