Closed RJVB closed 5 years ago
It's not dpkg
. It's the copies=2
that's the problem… It means that it saves every byte TWICE in two different places. That takes extra time….
Not much you can do, other than create a new filesystem with copies=1
(which kind'a defeats the purpose :).
On Wednesday September 30 2015 07:15:13 Turbo Fredriksson wrote:
It's not
dpkg
. It's thecopies=2
that's the problem… It means that it saves every byte TWICE in two different places. That takes extra time….
Yes, but that shouldn't affect the time required for reading the database, should it? (One could even imagine how a really clever copies= implementation would potentially be faster by picking the closest of the 2 copies for reading)
R.
It could be that prefetch is hurting us in apt's database. #3765 might help if that is the case.
On Wednesday September 30 2015 07:59:12 Richard Yao wrote:
It could be that prefetch is hurting us in apt's database. #3765 might help if that is the case.
How safe is that patch, applied over the current stable version for Ubuntu (0.6.5.1-1)?
another potentially relevant observation that I forgot to mention: it's especially the 1st database traversal (or whatever the "reading database ... XXX files currently installed" operation does) that is really slow. First being "after not having re/installed/upgraded anything for a while". Subsequent calls are much faster, possibly even reasonably fast given my hardware (hard to assess that).
Makes the issue tricky to debug ...
@RJVB it's not ready yet since preconditioned (needed) additional changes aren't in ZOL's code, so:
not safe
(the buildbots status give a hint how "safe" pull request mostly are, WIP == work in progress/process)
On Wednesday September 30 2015 09:36:04 kernelOfTruth aka. kOT, Gentoo user wrote:
@RJVB it's not ready yet since preconditioned (needed) additional aren't in ZOL's code, so:
not safe
(the buildbots status give a hint how "safe" pull request mostly are, WIP == work in progress/process)
Heh, I know what WIP stands for, and saw that the buildbots were running into issues ... but I also saw CentOS, and I tend to equate that with AncientOS ;)
R.
@RJVB I suspect this was caused by a recent performance regression which in 0.6.5 and 0.6.5.1. It's been addressed in 0.6.5.2 which is now available. Can you install 0.6.5.2 and see if it addresses the issue.
On Wednesday September 30 2015 11:39:22 Brian Behlendorf wrote:
@RJVB I suspect this was caused by a recent performance regression which in 0.6.5 and 0.6.5.1. It's been addressed in 0.6.5.2 which is now available. Can you install 0.6.5.2 and see if it addresses the issue.
I didn't repeat it here, but I've always had the slowness in this application. It was one reason I installed my gf's machine on a btrfs root.
I'll install 0.6.5.2 as soon as it appears in apt, and update this ticket accordingly.
R.
@RJVB Did 0.6.5.2
solve the problem? If so, could you close this please?
On Sunday October 04 2015 04:53:52 Turbo Fredriksson wrote:
@RJVB Did
0.6.5.2
solve the problem? If so, could you close this please?
It's very hard to determine whether the issue is really resolved; as I said before, caching means that apt/dpkg commands that do not involve (un)installing packages complete (much) faster when repeating them after the initial run.
That said, subsequent sudo apt-get update
runs complete about 2x faster on the machine running btrfs, which is in line with my experience that the CPU is about twice as fast on that machine as it is in the netbook with the ZFS root. Moving /var/lib/dpkg to a dataset with compression disabled seems to have decrease dpkg-related runtimes somewhat (so much for the idea that writing so disk is so slow that the gain obtained by the reduced file size compensates the CPU time required for compression).
You wouldn't happen to know if the apt database is accessed with a fixed record size, would you?
R.
I did a bit of testing recently with the apt/dpkg suite and there are 2 interesting things: first, all access to the database files (pkgcache.bin, etc.) are via mmap()
and second, there's a lot of fsync()
activity. I did not pursue it any further because typical apt/dpkg commands on the particular test system ran at almost identical speeds regardless of whether /var/cache/apt
was on ext4 or on zfs. I did not try moving /var/lib/apt
to zfs.
On Sunday October 04 2015 04:53:52 Turbo Fredriksson wrote:
@RJVB Did
0.6.5.2
solve the problem? If so, could you close this please?
Actually, I don't think it made a difference. I just did an apt-get install of a simple -dev package, and the "Reading database" step after downloading the archive advanced at about 5 seconds per 5% (and told me afterwards I have 473814 files and directories installed).
R.
just a heads up that this still is an issue.
On Thursday February 25 2016 00:38:43 K1773R wrote:
just a heads up that this still is an issue.
For me it is indeed. Were you asking, or were you saying that it affects you too?
R.
yes, it affects me too.
If the slowness is during the "Reading package lists..." phase, the issue is the creation of the /var/cache/apt/pkgcache.bin file. We've deployed a lot of Ubuntu Trusty systems and the apt-get program definitely runs very slow while the temp file is mmapped and being populated:
# grep pkg /proc/21258/maps
7f6218cfc000-7f621a4fc000 rw-s 00000000 00:11 243124 /var/cache/apt/pkgcache.bin.X77Hr5
The slowness only seems to occur when running the stock 3.13 kernel. When we update these systems to use the "Wily enablement stack", which has a 4.2 kernel, the problem goes away.
I don't have any Debian-based testing systems on which to try to track down the problem so I've not pursued it any further.
On Thursday February 25 2016 05:03:27 Tim Chase wrote:
The slowness only seems to occur when running the stock 3.13 kernel. When we update these systems to use the "Wily enablement stack", which has a 4.2 kernel, the problem goes away.
I'm running (one of) the latest(s) 3.14 kernel(s), so it's not just the 3.13 kernel. I'm currently not able to update beyond 3.14 (= not willing to give up certain software that doesn't support later kernels yet).
R.
I have the same issue inside lxc container on zfs pool. reading dpkg database and unpacking packages take ages. iotop shows z_null_int 99% load all the time but dd write is fast. It could be the same issue as #6171. container:
dd if=/dev/zero of=/root/test.txt bs=2048 count=1000000
1000000+0 records in
1000000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 20.2756 s, 101 MB/s
host:
pool: vmpool
state: ONLINE
scan: scrub repaired 12K in 0h5m with 0 errors on Sun Jan 14 00:29:31 2018
config:
NAME STATE READ WRITE CKSUM
vmpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000c500a3534c19 ONLINE 0 0 0
wwn-0x5000c500a3524678 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500a3526027 ONLINE 0 0 0
wwn-0x5000c500a35218bd ONLINE 0 0 0
errors: No known data errors
host
dpkg -l | grep zfs
ii libzfs2linux 0.7.4-pve2~bpo9 amd64 OpenZFS filesystem library for Linux
ii zfs-initramfs 0.7.4-pve2~bpo9 all OpenZFS root filesystem capabilities for Linux - initramfs
ii zfsutils-linux 0.7.4-pve2~bpo9 amd64 command-line tools to manage OpenZFS filesystems
container
iozone -n 128M -g 1G -r 16 -O -a C 1
Iozone: Performance Test of File I/O
Version $Revision: 3.429 $
Compiled for 64 bit mode.
Build: linux-AMD64
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
Vangel Bojaxhi, Ben England, Vikentsi Lapa.
Run began: Wed Jan 31 11:14:18 2018
Using minimum file size of 131072 kilobytes.
Using maximum file size of 1048576 kilobytes.
Record Size 16 kB
OPS Mode. Output is in operations per second.
Auto Mode
Command line used: iozone -n 128M -g 1G -r 16 -O -a C 1
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
131072 16 35504 43888 57747 64994 35141 38663 86044 50897 45671 40054 37652 51210 64318
262144 16 35945 38430 86103 88087 77852 40302 59836 51114 16688 37669 36340 47860 65142
524288 16 35390 35174 44695 61146 16254 31436 60745 52621 14921 35688 33951 47900 66149
1048576 16 34166 6162 46198 62465 14998 25206 57312 54533 14722 35196 31557 48225 66803
iozone test complete.
On Wednesday January 31 2018 10:35:41 aTanCS wrote:
It could be the same issue as #6171.
That'd be good news, because that issue is apparently fixed. However, I have been seeing this slowness since well before the regression mentioned in the fix commit message.
This issue continues to plague 0.7.12 (and Linux 4.14). I have been getting a hunch that the /var/lib/dpkg/info directory is a if not the culprit. That dir contains a very large number of tiny files, 2 or 3 per installed package, and apparently those files aren't always removed when packages are uninstalled. I've seen other situations where a key/value database-in-a-flat-directory lead to dismal performance on ZFS (and on FAT16 O:^)).
There must be ways to improve this on the ZFS level (when you can add as much disks and/or RAM as you want) but when you're just (ab?)using ZFS as a safe filesystem on a kickabout Linux notebook those are just not an option.
The best fix is also not really an option: getting the dpkg system to use a proper, fast key/value database (LMDB would probably do).
Here's a workaround that I'm testing, with initial encouraging results:
# create a sparse disk image file; /var/cache is on a dataset with copies=2 compression=lz4 but sync=disabled for speed (at a tiny risk; mostly benefits file deletion)
> sudo truncate -s 512M /var/cache/apt_dpkg.img
# create an XFS filesystem because it's good with small files; no need for checksums at this level (I think)
# ext4 probably works just as well
> sudo mkfs.xfs -b size=512 -m crc=0 -L dpkgStore /var/cache/apt_dpkg.img
# mount to a temp. place and populate the disk image:
> sudo mount -v -o loop /var/cache/apt_dpkg.img /mnt
> (cd /var/lib/dpkg && rsync -aAXH . /mnt)
> sudo umount -v /mnt
# add a line like this to /etc/fstab:
# /var/cache/apt_dpkg.img /var/lib/dpkg xfs rw,noatime,nodiratime,loop 0 0
# make backup, create new mountpoint, mount
> sudo mv /var/lib/dpkg{,.bak} ; sudo mkdir /var/lib/dpkg
> sudo mount -v /var/lib/dpkg
I'm not losing on disk space either:
> du -hcs /var/lib/dpkg.bak /var/cache/apt_dpkg.img /var/lib/dpkg; \
du -hcs --apparent-size /var/lib/dpkg.bak /var/cache/apt_dpkg.img /var/lib/dpkg
384M /var/lib/dpkg.bak
95M /var/cache/apt_dpkg.img
122M /var/lib/dpkg
599M total
115M /var/lib/dpkg.bak
512M /var/cache/apt_dpkg.img
116M /var/lib/dpkg
743M total
We'll see how this evolves over time. For now I went with the easier and safer solution; the alternative would be to set up a sparse zvol with the same properties as the dataset holding the apt and dpkg directories, and create the XFS fs in there. I'm not used to working with ZVOLs, hence the easier and safer (because I'm not putting my pool at risk); it does mean I'll be making very regular syncs of /var/lib/dpkg and /var/lib/dpkg.bak . I also like the fact that it's trivial (provided you keep backups) to recreate the sparse disk image when the existing one starts to degrade.
Edit, some results from the tcsh time built-in:
> time sudo fgrep BLABLA -R /var/lib/dpkg/info/
0.548 user_cpu 0.599 kernel_cpu 0:01.07 total_time 105.6%CPU {7368M 0F 2320R 220I 0O 4w 256c}
> time sudo fgrep BLABLA -R /var/lib/dpkg.bak/info/
0.749 user_cpu 0.856 kernel_cpu 0:01.52 total_time 104.6%CPU {7380M 0F 2329R 188654I 0O 39w 409c}
The big differences here are in the number of input operations (I
) and voluntary and involuntary context switches (w
and c
) all of which are lower with the sparse disk image approach.
Just throwing it out there, but why wouldn't gentoo have the same problems, the portage tree is made up of 163498 files consuming roughly 1.5gb of space.
Doesn't dpkg use fsync as well? Have you tried it with it turned off in ZFS?
On Mon, 7 Jan 2019 at 11:15, bunder2015 notifications@github.com wrote:
Just throwing it out there, but why wouldn't gentoo have the same problems, the portage tree is made up of 163498 files consuming roughly 1.5gb of space.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3857#issuecomment-451902620, or mute the thread https://github.com/notifications/unsubscribe-auth/AgRN17dhiy4PV7xQrkFxqF3rOA6clNaHks5vAyxEgaJpZM4GGhGC .
why wouldn't gentoo have the same problems, the portage tree is made up of 163498 files consuming roughly 1.5gb of space.
No idea, but it must have something to do with how that tree is accessed. Maybe it does what MacPorts does with its port tree; indexing file(s).
Come to think of it, I sometimes use their port tree repo (github:macports/macports-ports) as a source for benchmarking filesystem operations.
Doesn't dpkg use fsync as well? Have you tried it with it turned off in ZFS?
The slowest part is where it says "Reading database", so I doubt that it can be due to fsync'ing. But as mentioned above the dataset used here already has sync=disabled. Is there another way to "turn fsync off in ZFS"?
sync=disabled should be OK. Does it take the same amount of time no matter how many times the directory is read (so the cache is warm, assuming you're got metadata caching turned on)?
On Mon, 7 Jan 2019 at 11:38, René Bertin notifications@github.com wrote:
why wouldn't gentoo have the same problems, the portage tree is made up of 163498 files consuming roughly 1.5gb of space.
No idea, but it must have something to do with how that tree is accessed. Maybe it does what MacPorts does with its port tree; indexing file(s).
Come to think of it, I sometimes use their port tree repo (github:macports/macports-ports) as a source for benchmarking filesystem operations.
Doesn't dpkg use fsync as well? Have you tried it with it turned off in ZFS?
The slowest part is where it says "Reading database", so I doubt that it can be due to fsync'ing. But as mentioned above the dataset used here already has sync=disabled. Is there another way to "turn fsync off in ZFS"?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3857#issuecomment-451908273, or mute the thread https://github.com/notifications/unsubscribe-auth/AgRN17y_tI2rjKrl40iwt6n4ASYAlqIcks5vAzGwgaJpZM4GGhGC .
sync=disabled should be OK.
It has been for years :)
Does it take the same amount of time no matter how many times the directory is read (so the cache is warm, assuming you're got metadata caching turned on)?
No, for me the command is really slow only when not executed since a while (since the previous day, for instance).
And that's typical of things like dpkg, you don't use it very often most of the time (but when you do there's a good chance dpkg installs for cache with the packages it installs if you see what I mean).
Hmm, I think it's more up to (a) ZFS dev(s) to show how ZFS is NOT the cause of this issue - despite the fact that it occurs just about only with ZFS?
Mailing lists aren't really suitable for discussions that drag on for years....
This came up on IRC and I've been asked to file an issue about it. It will be mostly a set of observations as I can evidently not easily compare the same distribution on the same computer using various filesystems.
I'm running Kubuntu 14.04 LTS off a ZFS root. I use an EXT3 /boot partition. I originally installed the OS to a FS supported by the installer that comes with the Kubuntu live images, and then rsynced the result onto my ZFS rootfs.
Since then I have tried to tune the pool my reducing copies=2 to the directories that really deserve it, and moving directories with more or less "volatile" content onto datasets that have sync disabled or even also checksums disabled.
One class of operations that has always been much slower since moving to ZFS is installing, removing or upgrading packages with apt-get or dpkg. I think this is related to the database(s) involved, the step where it says "reading database" takes much longer than I'd expect. (Expectation is based here on over all l performance differences with another, faster laptop running the same distribution off a btrfs partition.) But for comparison, upgrading or installing a new kernel with accompanying headers is noticeably faster when I disable sync; progress messages appear in much faster succession as soon as sync is disabled. As a result I put /var/cache/apt on a dataset that has sync and compression disabled (but checksums enabled, and copies=2), which made database access somewhat faster, but still very slow.
My hunch is that dpkg uses database algorithms or settings that are poorly appropriate when the db file is stored on zfs.