google / mount-zip

FUSE file system for ZIP archives
GNU General Public License v3.0
158 stars 16 forks source link

Cannot create a cache file with O_TMPFILE on overlayfs #23

Closed tenzap closed 7 months ago

tenzap commented 7 months ago

I was trying to replace fuse-zip by mount-zip in my gitlab-ci job because it would have improved performance. However I noticed it is extremely slow, which makes it useless for now in my scenario.

EDIT: I have the same problem locally and it consumes 100% CPU. It is not stuck of frozen, it just progresses very slowly.

The zipfile is Android's NDK r26b for linux.

To gain space on the gitlab runner I don't unzip it and mount the zip directly.

It is then used to build Qt for Android (ie. it accesses a lot the files in the zip which contains cmake, the compiler (clang), header files, libs...). After configuration stage, approx 10k files are built with ninja.

With fuse-zip, complete "configure+build+install cycle" takes approx 75min for a complete cycle (given that there is ccache also which speeds up things) When using mount-zip, it didn't even pass the configuration step after 150min. I stopped the job completely at that time and reverted back to fuse-zip.

FYI, I was once using lsof to check if a file is open in that gitlab job (running Debian) and had to change to fuser because lsof was extremely slow compared to it fuser. A single call to lsof took 2-3sec (with network/DNS search disabled) while it was almost instant for fuser. I mention this in case mount-zip uses something similar to what lsof does. However locally I have the same problem.

fdegros commented 7 months ago

Trying to reproduce with a recent mount-zip 1.0.13 and libzip 1.10.1.

$ mount-zip --version
mount-zip version: 1.0.13
libzip version: 1.10.1
FUSE library version: 2.9.9
fusermount3 version: 3.14.0
using FUSE kernel interface version 7.19

I downloaded the file android-ndk-r26b-linux.zip.

$ ls -lh
total 639M
-rw-r--r-- 1 francois francois 639M Apr 18 18:32 android-ndk-r26b-linux.zip

For comparison, unzipping this ZIP with unzip takes 27 seconds on my computer. The unzipped archive contains 526 directories and 7925 files for a total of 16 GB.

$ time unzip -d out android-ndk-r26b-linux.zip
...
real    0m26.618s
user    0m21.912s
sys     0m4.273s

$ tree -a --du -h out
...
  16G used in 526 directories, 7925 files

$ rm -r out

Mounting the archive with mount-zip only takes 0.11 seconds.

$ time mount-zip android-ndk-r26b-linux.zip mnt

real    0m0.114s
user    0m0.062s
sys     0m0.053s

$ tree -a --du -h mnt
...
  16G used in 526 directories, 7925 files

Copying all the files from the mounted ZIP using cp -R takes 16 seconds. This recursive copy effectively opens, decompresses and copies every single file from the mounted ZIP. This exercises the whole FUSE + mount-zip + libzip stack. This is surprisingly faster than extracting the archive with unzip.

$ time cp -R mnt out

real    0m16.286s
user    0m0.219s
sys     0m3.473s

$ tree -a --du -h out
...
  16G used in 526 directories, 7925 files

$ rm -r out

$ umount mnt

So, I don't know why you observed some very slow access times. One of the hypotheses is that your access pattern repetitively decompresses a big file in the archive. If this is the case, it might be beneficial to use the --precache option with mount-zip. This preemptively decompresses every single file at mount time, and that takes about 16 seconds.

$ time mount-zip --precache android-ndk-r26b-linux.zip mnt
...
real    0m15.889s
user    0m11.559s
sys     0m2.928s

After that, copying all the files from the mounted ZIP only takes 10 seconds. And every access pattern should be equally fast, since there is no decompression involved in the process anymore.

$ time cp -R mnt out

real    0m10.142s
user    0m0.196s
sys     0m3.743s

$ tree -a --du -h out
...
  16G used in 526 directories, 7925 files
fdegros commented 7 months ago

You use an "old" version of mount-zip (1.0.7), which does not feature the --precache option. This option was added in version 1.0.8.

I'm going to close this bug as "resolved", since I guess that the --precache option might solve your issue. Please reopen if you're still seeing slow access patterns with a recent version of mount-zip and while using the --precache option.

tenzap commented 7 months ago

I can't use a more recent version, because libzip is not recent enough in Debian. (see also #20)

What does precache do? Does it store the data somewhere and as a consequence take some disk space? The reason I don't uncompress the zip is that there isn't enough disk space to do so, so if precache does that, I can't use it.

tenzap commented 7 months ago

My access pattern is building a large application where the toolchain (clang, cmake, binutils, sysroot, libs, includes...) is accessed through fuse-zip/mount-zip

tenzap commented 7 months ago

I just tried with master branch (ie. 1.0.13 with support for older libzip) + libzip 1.7.3 and without --precache it is still much slower that fuse-zip.

Since fuse-zip & mount-zip both use libzip 1.7.3, slowness seems to come from mount-zip, and not from the use of an old version of libzip.

precache is not an alternative because I don't want to consume disk space and it looks like precache uncompresses the zip to disk.

fdegros commented 7 months ago

See the discussion on bug #20.

It seems that mount-zip fails to create a cache file with the O_TMPFILE flag in the tmp dir when the underlying filesystem is overlayfs. This impedes the caching mechanism and results in poor performance when faced with non-sequential access to the contained files. See also the documentation.

This also explains why the other ZIP mounter fuse-zip does not exhibit this performance degradation, since it caches all the uncompressed data in memory.

I can think of several solutions or workarounds:

  1. Use a suitable filesystem to host the tmp dir, such as ext2, ext3, ext4 or tmpfs.
  2. Modify the cache file creation code in mount-zip to avoid using O_TMPFILE.
  3. Modify the cache file creation code in mount-zip to use an anonymous in-memory file created by memfd_create, possibly as a backup solution if the cache file cannot be created in the tmp dir.
tenzap commented 7 months ago

Performance is now satisfactory with latest master in my setup, whether I use --precache or not. Thank you. I also updated the Launchpad's PPA and it includes all your recent changes.

fdegros commented 7 months ago

Can you please check again with the latest changes at commit 87c5d16f3eade4ee8ec3e381aefb029e697d684c?

tenzap commented 7 months ago

It is not that easy to check every little change. :/

tenzap commented 7 months ago

However, a fast check on my system where it used to be slow, looks fine. (without --precache)

And with precache:

mount-zip: The filesystem of '/tmp' does not support O_TMPFILE
mount-zip: Created cache file '/tmp/AWWJ3o'

precache could be slower than without it, not sure though.

tenzap commented 7 months ago

How is one supposed to invoke mount-zip to have "Using memory cache"? without --cache, or with --cache=? Maybe this should be documented somewhere.

fdegros commented 7 months ago

Thanks for the verification. It looks good.

How is one supposed to invoke mount-zip to have "Using memory cache"? without --cache, or with --cache=?

Yes, that's right. You can use --cache= at the moment in order to experiment with the memory cache.

However, this feels a bit like an undocumented hack. I'm thinking about adding a separate and properly documented command-line option for that. Maybe something like --memcache.

fdegros commented 7 months ago

I added the --memcache option. Feel free to experiment with it.