Closed ismell closed 2 years ago
@zmedico fyi
Any chance #11910 is related?
See https://bugs.chromium.org/p/chromium/issues/detail?id=1198679 for more of the nitty-gritty debugging.
@ismell it's not clear to me from the comment above, but does the patch you posted above avoid the issue in your testing?
One plausible way you might try and reproduce this is to write to the file while performing the lseek()
. This case is handled by the code but it's one of the less likely code paths so it's possible there might be an issue here.
No the patch above was just to confirm that the pointer was at the start of the file. I haven't yet been able to find an easy way to reproduce the problem. There should be no writers to the input file at this point. I'll keep working on trying to find an easy way to reproduce it. Is there anything else I should be tracing that will help?
One thing you could try to rule out the dirty data case would be to add an fsync(fd)
before the the lseek()
in the portage code. That would help us narrow it down a bit, if you no longer see failures we'll have a good idea where to look. Getting a good strace
of exactly the IO pattern being issued to cause the problem would also be helpful.
Thanks for the tip.
So I wrote a script that calls emerge depthcharge
in a loop and verifies the build files are not NULL.
$ test-zfs
>>> 10:51:18 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:51:38 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Jobs: 1 of 1 complete Load avg: 2.13, 1.59, 1.44
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 1, Success: 2, Failure: 1
>>> 10:51:44 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:04 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Jobs: 1 of 1 complete Load avg: 2.09, 1.62, 1.46
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 2, Success: 3, Failure: 3
>>> 10:52:10 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:30 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Jobs: 1 of 1 complete Load avg: 1.85, 1.60, 1.45
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 3, Success: 4, Failure: 5
>>> 10:52:36 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:56 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Jobs: 1 of 1 complete Load avg: 1.56, 1.55, 1.44
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 4, Success: 5, Failure: 7 <-- Everything's not awesome :(
diff --git a/src/portage_util_file_copy_reflink_linux.c b/src/portage_util_file_copy_reflink_linux.c
index 352342c06..004b8618a 100644
--- a/src/portage_util_file_copy_reflink_linux.c
+++ b/src/portage_util_file_copy_reflink_linux.c
@@ -140,8 +140,10 @@ do_lseek_data(int fd_out, int fd_in, off_t *off_out) {
/* Use lseek SEEK_DATA/SEEK_HOLE for sparse file support,
* as suggested in the copy_file_range man page.
*/
- off_t offset_data, offset_hole;
+ off_t offset_data, offset_hole, current;
+ fsync(fd_in);
+ current = lseek(fd_in, 0, SEEK_CUR);
offset_data = lseek(fd_in, *off_out, SEEK_DATA);
if (offset_data < 0) {
if (errno == ENXIO) {
$ test-zfs
>>> 10:55:17 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:55:37 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Jobs: 1 of 1 complete Load avg: 3.36, 2.60, 1.87
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 1, Success: 3, Failure: 0
>>> 10:55:43 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:06 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Jobs: 1 of 1 complete Load avg: 2.57, 2.48, 1.85
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 2, Success: 6, Failure: 0
>>> 10:56:12 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:32 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Jobs: 1 of 1 complete Load avg: 3.10, 2.59, 1.91
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 3, Success: 9, Failure: 0
>>> 10:56:39 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:58 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Jobs: 1 of 1 complete Load avg: 4.32, 2.96, 2.05
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 4, Success: 12, Failure: 0
>>> 10:57:05 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:57:24 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Jobs: 1 of 1 complete Load avg: 3.42, 2.86, 2.05
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 5, Success: 15, Failure: 0 <--- Everything is awesome!
It looks like adding fsync
fixes the problem. straceing
emerge results in a huge log. I'll try and find the relevant parts.
@ismell if you're comfortable rebuilding ZFS for the testing I could provide a debug patch which should give us a little more insight.
I'm sure I can figure it out :)
In that case, would you mind testing commit f73e0a63e9a33649509d27dad116afe9a6a22b59 from the https://github.com/behlendorf/zfs/tree/zfs-holey branch. It implements a possible fix by closing a small race. It's only been lightly tested since I wasn't able to reproduce the issue.
Sorry for the long cycle time. I was waiting until I was forced to reboot my computer.
So I tried f73e0a6~1
to verify I could reproduce the issue without your patch. I was not able to reproduce it....
$ zfs version
zfs-2.1.99-117_g44287ffa0
zfs-kmod-2.1.99-117_g44287ffa0
$ ./test-zfs
Iteration: 100, Success: 300, Failure: 0
Done
Success: 300
Failure: 0
This normally reproduced within a few iterations.
So maybe it was fixed already?
So I tried f73e0a6~1 to verify I could reproduce the issue without your patch. I was not able to reproduce it....
To be clear you're saying that even without the path you're unable to reproduce the issue now? There weren't any other recent changes in this area which I would have expected to fix this.
So I went back to the original version:
$ zfs version
zfs-2.0.3-1~bpo10+1
zfs-kmod-2.0.3-1~bpo10+1
It looks like I can't reproduce the issue anymore...
>>> 11:31:58 Emerging (1 of 1) sys-boot/depthcharge-9999::chromiumos for /build/guybrush/
>>> 11:32:20 Installing (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Completed (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Jobs: 1 of 1 complete Load avg: 12.2, 10.9, 8.3
>>> Using system located in ROOT tree /build/guybrush/
>>> No outdated packages were found on your system.
Iteration: 7, Success: 21, Failure: 0
I'm wondering if rebooting helped clear what ever bad state I was in...
I've started hitting a similar issue which manifests as go
segfaulting immediately on launch (initially reported at https://github.com/golang/go/issues/48636 - I'd assumed the Go compiler was somehow spitting out a completely malformed binary).
More precisely:
emerge
(install) Go on my Gentoo system with ZFS as rootfs;I eventually somehow figured I should try downgrading coreutils from 9.0 -> 8.32(-r1 in Gentoo) which made the issue disappear: rebuilding Go worked! coreutils 9.0 changed some default behaviours around reflinking, which ZFS doesn't have, and sparse files, but I tried forcing them off in the cp
invocation in the ebuilds with no luck anyway.
I then decided to bisect and arrived at:
a6eaee501f6ec0c152abe88640203a64c390993e is the first bad commit
commit a6eaee501f6ec0c152abe88640203a64c390993e
Author: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu Jun 25 16:31:44 2020 -0700
cp: use SEEK_DATA/SEEK_HOLE if available
If it works, prefer lseek with SEEK_DATA and SEEK_HOLE to FIEMAP,
as lseek is simpler and more portable (will be in next POSIX).
Problem reported in 2011 by Jeff Liu (Bug#8061).
* NEWS: Mention this.
* src/copy.c (lseek_copy) [SEEK_HOLE]: New function.
(enum scantype): New constants ERROR_SCANTYPE, LSEEK_SCANTYPE.
(union scan_inference): New type.
(infer_scantype): Last arg is now union scan_inference *,
not struct extent_scan *. All callers changed.
Prefer SEEK_HOLE to FIEMAP if both work, since
SEEK_HOLE is simpler and more portable.
(copy_reg): Do the fdadvise after initial scan, in case the scan
fails. Report an error if the initial scan fails.
(copy_reg) [SEEK_HOLE]: Use lseek_copy if scantype says so.
NEWS | 3 +
src/copy.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 201 insertions(+), 11 deletions(-)
bisect run success
I've straced Portage (just for the lseek syscall) while installing Go and it is filled with:
[...]
11327 lseek(3, 5310, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 9987, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 8334, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 18386, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 33472, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 15270, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 9132, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 15471, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 7389, SEEK_DATA) = -1 ENXIO (No such device or address)
11327 lseek(3, 7876, SEEK_DATA) = -1 ENXIO (No such device or address)
[...]
TL;DR: I seem to be hitting this bug as a result of changes in coreutils which exposes the problem. Please let me know if you feel this is a different issue and I should open a new bug.
Portage 3.0.23 (python 3.10.0-candidate-2, default/linux/amd64/17.1/hardened, gcc-11.2.0, glibc-2.33-r7, 5.14.7-adry x86_64) ================================================================= System uname: Linux-5.14.7-adry-x86_64-11th_Gen_Intel-R-_Core-TM-_i7-11800H_@_2.30GHz-with-glibc2.33 KiB Mem: 16105692 total, 1991680 free KiB Swap: 0 total, 0 free Timestamp of repository gentoo: Sun, 26 Sep 2021 20:51:27 +0000 Head commit of repository gentoo: fe49ef8e5b7b42f9fcb1f7bf36fbf4c571b8b120 Timestamp of repository kde: Thu, 23 Sep 2021 11:06:23 +0000 Head commit of repository kde: 0bd2f887a7bdfe01ae0c753908bf1132f253263a sh dash 0.5.11.5 ld GNU ld (Gentoo 2.37_p1 p0) 2.37 ccache version 4.4.1 [disabled] app-shells/bash: 5.1_p8::gentoo dev-java/java-config: 2.3.1::gentoo dev-lang/perl: 5.34.0-r2::gentoo dev-lang/python: 3.8.12::gentoo, 3.9.7::gentoo, 3.10.0_rc2::gentoo dev-lang/rust-bin: 1.55.0::gentoo dev-util/ccache: 4.4.1::gentoo dev-util/cmake: 3.21.3::gentoo sys-apps/baselayout: 2.7-r3::gentoo sys-apps/openrc: 0.44.5::gentoo sys-apps/sandbox: 2.25::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.71-r1::gentoo sys-devel/automake: 1.16.4::gentoo sys-devel/binutils: 2.37_p1::gentoo sys-devel/gcc: 10.3.0-r2::gentoo, 11.2.0::gentoo sys-devel/gcc-config: 2.4::gentoo sys-devel/libtool: 2.4.6-r6::gentoo sys-devel/make: 4.3::gentoo sys-kernel/linux-headers: 5.14::gentoo (virtual/os-headers) sys-libs/glibc: 2.33-r7::gentoo Repositories: gentoo location: /var/db/repos/gentoo sync-type: git sync-uri: git://github.com/gentoo-mirror/gentoo.git priority: -1000 sync-git-clone-extra-opts: -b stable -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now sync-git-verify-commit-signature: yes kde location: /var/db/repos/kde sync-type: git sync-uri: https://github.com/gentoo-mirror/kde.git masters: gentoo local location: /var/db/repos/local masters: gentoo ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches" DISTDIR="/var/cache/distfiles" EMERGE_DEFAULT_OPTS="--keep-going=y --complete-graph --deep --changed-deps-report=y" ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox mount-sandbox multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches" GENTOO_MIRRORS="http://mirror.bytemark.co.uk/gentoo/ http://www.mirrorservice.org/sites/distfiles.gentoo.org/ http://mirrors.soeasyto.com/distfiles.gentoo.org/ http://mirrors.gethosted.online/gentoo" LANG="en_GB.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0" LINGUAS="en en_GB" MAKEOPTS="-j8" PKGDIR="/var/cache/binpkgs" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="PIC X a52 aac acl acpi activities aes alsa amd64 avx avx2 avx512bw avx512cd avx512dq avx512f avx512vbmi avx512vl bluetooth branding bzip2 cairo caps cdda cdr clang crypt dbus declarative dist-kernel dri dts dvd dvdr elogind emacs emboss encode exif f16c flac fma3 fortran gif gles2 gmp gpm graphite gtk gui hardened hunspell iconv icu ipv6 iwd jit jpeg kde kipi kwallet lcms libglvnd libnotify libtirpc mad mmx mmxext mng mp3 mp4 mpeg multilib ncurses networkmanager nptl ogg opengl openmp pam pango pclmul pcre pdf pic pie plasma png policykit popcnt ppds pulseaudio qml qt5 rdrand readline screencast sdl seccomp semantic-desktop sha smartcard spell split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification svg syslog system-av1 system-binutils system-boost system-bootstrap system-cairo system-clang system-digest system-ffmpeg system-harfbuzz system-heimdal system-icu system-jpeg system-jsoncpp system-leveldb system-libevent system-libs system-libvpx system-libyaml system-lz4 system-mitkrb5 system-sqlite system-ssl system-tbb system-uulib system-webp system-zlib threads tiff truetype udev udisks unicode upower usb vaapi verify-sig vorbis wayland widgets x264 xattr xcb xml xtpax xv xvid zlib" ABI_X86="32 64" ADA_TARGET="gnat_2019" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512vbmi f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en en-GB" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9 python3_10 python3_8" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="intel i965 iris nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RUSTFLAGS
EDIT: Interestingly, upstream made a kind-of-related remark in a commit after the release:
@behlendorf I've tried applying the patch you suggested to ZFS 2.1.1 but unfortunately, while it applies, it ends up failing at build time. Is there any chance you're able to please rebase it?
The failure looks like:
/bin/sh ../../libtool --tag=CC --silent --mode=compile x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -include ../../zfs_config.h -I../../include -I../../include -I../../module/icp/include -I../../lib/libspl/include -I../../lib/libspl/include/os/linux -I../../include/os/linux/zfs -D_GNU_SOURCE -D_REENTRANT -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -DHAVE_LARGE_STACKS=1 -DLIBEXECDIR=\"/usr/libexec\" -DRUNSTATEDIR=\"/var/run\" -DSBINDIR=\"/sbin\" -DSYSCONFDIR=\"/etc\" -DPKGDATADIR=\"/usr/share/zfs\" -UDEBUG -DNDEBUG -DTEXT_DOMAIN=\"zfs-linux-user\" -DDEBUG -UNDEBUG -DZFS_DEBUG -std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes -fno-strict-aliasing -fno-omit-frame-pointer -Wno-format-zero-length -Wno-unused-but-set-variable -Wframe-larger-than=4096 -DLIB_ZPOOL_BUILD -O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches -c -o dnode.lo ../../module/zfs/dnode.c
../../module/zfs/dnode.c: In function ‘dnode_is_dirty’:
../../module/zfs/dnode.c:1661:29: error: incompatible types when assigning to type ‘multilist_t *’ {aka ‘struct multilist *’} from type ‘multilist_t’ {aka ‘struct multilist’}
1661 | dirtylist = dn->dn_objset->os_dirty_dnodes[i];
| ^~
make[3]: *** [Makefile:1326: dnode.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
@thesamesam thanks for continuing the investigation of this issue.
I've rebased my https://github.com/behlendorf/zfs/tree/zfs-holey branch and resolved the build failure (commit 1722451cf7e0588632f116c488009e68dd1f8dcf). If you can give it a try that would be great.
@behlendorf Thanks!
(Note: I hadn't been able to reproduce this on other ZFS'd machines with pretty similar setups, so I/O speed might be a factor, which I guess is obvious given it's racey).
Applied to 2.1.1 and built fine (rebuilt initramfs and so on, and rebooted) but no luck. Same corrupted /usr/bin/go
.
I'm happy to enable more debugging as necessary, especially given I know this is tricky given you can't reproduce it on your end.
Here's a partial strace during this cp line from the ebuild (I've just picked out the bits around copying the corrupted binaries, but I can get you more if it's useful):
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/api", {st_mode=S_IFDIR|0755, st_size=23, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 newfstatat(AT_FDCWD, "bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9730, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9850, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005 mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0755) = 0
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 openat(AT_FDCWD, "bin", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005 newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_EMPTY_PATH) = 0
8005 getdents64(3, 0x5619626c61c0 /* 4 entries */, 131072) = 104
8005 getdents64(3, 0x5619626c61c0 /* 0 entries */, 131072) = 0
8005 close(3) = 0
8005 newfstatat(AT_FDCWD, "bin/gofmt", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 openat(AT_FDCWD, "bin/gofmt", O_RDONLY|O_NOFOLLOW) = 3
8005 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_EMPTY_PATH) = 0
8005 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/gofmt", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005 newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005 ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005 lseek(3, 0, SEEK_DATA) = 0
8005 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005 lseek(3, 0, SEEK_HOLE) = 131072
8005 lseek(3, 0, SEEK_SET) = 0
8005 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005 write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005 lseek(3, 131072, SEEK_DATA) = -1 ENXIO (No such device or address)
8005 ftruncate(4, 3318813) = 0
8005 close(4) = 0
8005 close(3) = 0
8005 newfstatat(AT_FDCWD, "bin/go", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 openat(AT_FDCWD, "bin/go", O_RDONLY|O_NOFOLLOW) = 3
8005 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_EMPTY_PATH) = 0
8005 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/go", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005 newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005 ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005 lseek(3, 0, SEEK_DATA) = 0
8005 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005 lseek(3, 0, SEEK_HOLE) = 131072
8005 lseek(3, 0, SEEK_SET) = 0
8005 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005 write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005 lseek(3, 131072, SEEK_DATA) = -1 ENXIO (No such device or address)
8005 lseek(3, 131072, SEEK_DATA) = -1 ENXIO (No such device or address)
8005 ftruncate(4, 14013626) = 0
8005 close(4) = 0
8005 close(3) = 0
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 newfstatat(AT_FDCWD, "doc", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b96f0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b9810, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005 mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0755) = 0
8005 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005 openat(AT_FDCWD, "doc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005 newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_EMPTY_PATH) = 0
8005 getdents64(3, 0x5619626c62b0 /* 6 entries */, 131072) = 176
8005 getdents64(3, 0x5619626c62b0 /* 0 entries */, 131072) = 0
8005 close(3) = 0
8005 newfstatat(AT_FDCWD, "doc/go_mem.html", {st_mode=S_IFREG|0644, st_size=13565, ...}, AT_SYMLINK_NOFOLLOW) = 0
EDIT: I did out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
but it didn't make any difference.
out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync but it didn't make any difference.
That's very surprising. When set ZFS should always report the correct holes at the cost of performance. Were you able to reproduce this behavior with both the unmodified and patched code?
Yeah, I just tried again unpatched and it happens regardless of the value of the toggle (0 or 1).
Happy to try enabling more debugging if can think of a way to toggle it on/off easily during the merge.
Looking at the strace output what appears to be happening is:
bin/go
file is opened and determined to be 14,013,626 bytes is sizeThe story is similar for bin/gofmt
. Let me take another look and see if I can provide you a small debugging patch to determine what's going wrong. If you still have the strace logs it would be helpful to know how the bin/go
file being copied here was originally written (mmap or write syscalls) and when.
Ah, it gets renamed during the build process:
# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out'
go_nuts_zfs.log:4923824:4931 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log:4923832:4931 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log:4936086:1309 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log:4936699:1309 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log:4936733:1309 fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log:4936735:1309 renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0
Here's the output for 10 lines after every instance of the path being mentioned:
# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out' -A 10
go_nuts_zfs.log:4923824:4931 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log-4923827-716 <... epoll_pwait resumed>[], 128, 5, NULL, 1033473606543) = 0
go_nuts_zfs.log-4923828-4932 <... nanosleep resumed>NULL) = 0
go_nuts_zfs.log-4923829-4931 <... unlinkat resumed>) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log-4923830-716 epoll_pwait(4, <unfinished ...>
go_nuts_zfs.log-4923831-4932 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
go_nuts_zfs.log:4923832:4931 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log-4923833-716 <... epoll_pwait resumed>[], 128, 0, NULL, 140732691746960) = 0
go_nuts_zfs.log-4923834-716 epoll_pwait(4, <unfinished ...>
go_nuts_zfs.log-4923835-4931 <... openat resumed>) = 3
go_nuts_zfs.log-4923836-4931 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3840094264, u64=140698378776632}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4923837-4932 <... nanosleep resumed>NULL) = 0
go_nuts_zfs.log-4923838-4931 mmap(0xc000400000, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0 <unfinished ...>
go_nuts_zfs.log-4923839-4932 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
go_nuts_zfs.log-4923840-4931 <... mmap resumed>) = 0xc000400000
go_nuts_zfs.log-4923841-4931 madvise(0xc000200000, 2097152, MADV_HUGEPAGE) = 0
go_nuts_zfs.log-4923842-4932 <... nanosleep resumed>NULL) = 0
--
go_nuts_zfs.log:4936086:1309 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936087-1309 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936088-1309 pread64(3, "\177ELF", 4, 0) = 4
go_nuts_zfs.log-4936089-1309 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 31744) = 31744
go_nuts_zfs.log-4936090-1309 read(3, "H\205\322t$H\211\\$8H\213A H\211\336H\211\323H\211\361\350\4\323\0\0H\213\214$"..., 31744) = 31744
go_nuts_zfs.log-4936091-1309 read(3, "\0G\17\266\24\32I\203\372D\17\203\34\1\0\0L\215\35\311\24\222\0C\17\267\fS\353\"I\215"..., 31744) = 31744
go_nuts_zfs.log-4936092-1309 read(3, "L\211D$(L\211L$0L\211T$8L\211\\$@\350G\276\4\0H\213D$\10\213\\"..., 31744) = 31744
go_nuts_zfs.log-4936093-1309 read(3, "\0\273\21\0\0\0\350\225a\1\0L\213\204$\320\1\0\0M\213@(L\211\204$\320\1\0\0L"..., 31744) = 31744
go_nuts_zfs.log-4936094-1309 read(3, "i\1\0\0\0\0\0\203=\362X\231\0\0u\tH\211\220Xi\1\0\353\fH\215\270Xi\1\0"..., 31744) = 31744
go_nuts_zfs.log-4936095-1309 read(3, "\350{\344\375\377H\213l$\30H\203\304 \303\350LJ\3\0\351G\377\377\377\314\314\314\314\314\314\314"..., 31744) = 31744
go_nuts_zfs.log-4936096-1309 read(3, "\1H\213\5p\221Z\0\3503\271\1\0\350\16\306\1\0\211D$\24H\215\rS\221Z\0\204\1H"..., 31744) = 31744
--
go_nuts_zfs.log:4936699:1309 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log-4936700-1309 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936701-1309 pwrite64(3, "Zcbx5r99nvfz0xObj6_4/q7UMKiu3A_b"..., 83, 3984) = 83
go_nuts_zfs.log-4936702-1309 pread64(3, "\177ELF", 4, 0) = 4
go_nuts_zfs.log-4936703-1309 close(3) = 0
go_nuts_zfs.log-4936704-1309 write(2, "", 0) = 0
go_nuts_zfs.log-4936705-1309 futex(0xc00089a550, FUTEX_WAKE_PRIVATE, 1) = 1
go_nuts_zfs.log-4936706-31945 <... futex resumed>) = 0
go_nuts_zfs.log-4936707-1309 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin", <unfinished ...>
go_nuts_zfs.log-4936708-31945 futex(0xc00089ad50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
go_nuts_zfs.log-4936709-1612 <... epoll_pwait resumed>[], 128, 1, NULL, 1033883350377) = 0
--
go_nuts_zfs.log:4936733:1309 fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log-4936734-1309 newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go", 0xc00048b7c8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4936735:1309 renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0
go_nuts_zfs.log-4936736-1309 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", 0) = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936737-1309 unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", AT_REMOVEDIR) = -1 ENOTEMPTY (Directory not empty)
go_nuts_zfs.log-4936738-1309 openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936739-1309 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936740-1309 unlinkat(3, "b303", 0) = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936741-1309 newfstatat(3, "b303", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
go_nuts_zfs.log-4936742-1309 openat(3, "b303", O_RDONLY|O_CLOEXEC) = 7
go_nuts_zfs.log-4936743-1309 epoll_ctl(4, EPOLL_CTL_ADD, 7, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936744-1309 getdents64(7, 0xc000aa6000 /* 6 entries */, 8192) = 176
go_nuts_zfs.log-4936745-1309 getdents64(7, 0xc000aa6000 /* 0 entries */, 8192) = 0
You're welcome to the full 550MB of strace spew if you want it rather than me trying to clumsily filter, but I appreciate that's no holiday for you either!
@thesamesam would you mind trying this updated patch with your test setup, 63382271a36a69e074911eecd33bea500f9ac20a. It would be great if you could try it with zfs_dmu_offset_next_sync=0
first, if you still observe a failure then please try zfs_dmu_offset_next_sync=1
.
@behlendorf No difference before/after with setting the toggle, but I'll try booting with it in case that somehow makes a difference.
I assume this won't make any difference but I've been applying these on top of 2.1.1. I can try master?
EDIT: Yeah, no difference with the option set from boot.
EDIT: ... and I tried with zfs_dmu_offset_next_sync=2
(noticed in the patch) set after booting which made the 'dangerous copy' a lot slower, but still ended up with corrupted go
. Tried with master 0/1/2 as well but no luck.
One hypothesis is that the problem occurs when some other process creates a file via open+copy_file_range+ftruncate+close, and then soon afterwards GNU cp attempts to lseek(thatfile, 0, SEEK_DATA). Perhaps GNU cp sees the effect of the ftruncate without seeing the effect of the copy_file_range, due to a race in the kernel somewhere. For more details, please see my recent comment in GNU coreutils Bug#51433.
I see a very similar issue and partial solution discussed with FreeBSD folks: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256205
If we can't see our way to a proper fix quickly, one possibility would be to temporarily modify OpenZFS so that it reports no holes in any files. I suggested this in GNU coreutils Bug#51433#26. Yet another possibility would be to modify ZFS lseek SEEK_DATA
to fail with errno == EOPNOTSUPP
; this latter approach might be better for GNU cp
as cp
would find holes in the data via other (slower) means. Of course it'd be better to fix the ZFS bug, whatever it is; but at least these workarounds would avoid data loss.
I've got a test patch to add a tunable to disable SEEK_HOLE/DATA I'm going to try running through its paces now...obviously not ideal, but better than "distros patching coreutils to disable using SEEK_HOLE citing this bug".
Does anyone have a pointer to a good reproducer for this that doesn't involve installing Gentoo?
e: Or a good reproducer with Gentoo - I've got it trying to emerge go and see if go segfaults in a loop at the moment on an SSD...
So, just returning EOPNOTSUPP causes one or two test failures.
I'll try just using the generic fallback case, but that still needs a workaround equivalent for FBSD. Will look into that once I can reproduce it and see it not reproduce...
[...] Does anyone have a pointer to a good reproducer for this that doesn't involve installing Gentoo?
e: Or a good reproducer with Gentoo - I've got it trying to emerge go and see if go segfaults in a loop at the moment on an SSD...
This only happens for me reliably on one machine with NVME. If you're using Gentoo, please make sure you forcefully install coreutils 9.0 by running e.g. emerge -v1O ">=sys-apps/coreutils-9.0"
.
I'm happy to try out whatever, although I do wish I could easily share a way for others to reproduce this :(
So, just returning EOPNOTSUPP causes one or two test failures.
If it doesn't work even when lseek
fails with EOPNOTSUPP
, that suggests that the problem affects read
as well as lseek
, which would be an even more-serious bug in OpenZFS.
The following code in coreutils/src/copy.c should mean that an EOPNOTSUPP
failure from lseek
should cause cp
to look for zeros merely by reading and discarding zeros, without using SEEK_HOLE
or SEEK_DATA
. If you have the time, please investigate whether this bit of code is returning ZERO_SCANTYPE
like it should, even when lseek
returns -1 with errno == EOPNOTSUPP
. If it does return ZERO_SCANTYPE
, that suggests the more-serious bug (which I hope is not the case).
#ifdef SEEK_HOLE
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start)
return LSEEK_SCANTYPE;
else if (errno != EINVAL && !is_ENOTSUP (errno))
return errno == ENXIO ? LSEEK_SCANTYPE : ERROR_SCANTYPE;
#endif
return ZERO_SCANTYPE;
When I quickly looked, the test failures I looked at had failed with not handling lseek returning EOPNOTSUPP, not corruption.
When I quickly looked, the test failures I looked at had failed with not handling lseek returning EOPNOTSUPP, not corruption.
Oh, were these test failures due to GNU cp
doing the wrong thing when lseek
with SEEK_HOLE
or SEEK_DATA
failed with EOPNOTSUPP
, or were they due to other programs getting confused by lseek
doing that? If the former, it's a GNU coreutils bug that we should fix; if the latter, that suggests we should work around the OpenZFS lseek
bug in a different way (e.g., by having OpenZFS always pretend that the file has no holes).
OpenZFS has a fallback codepath on Linux (of just calling the generic Linux llseek handler) if SEEK{DATA,HOLE} are not implemented, which I was going to just make it use, but I was concerned about whether it would handle it correctly versus just being called with no SEEK* support in the system at all.
The other reason I was concerned is that FBSD has no such fallback codepath that I could immediately see, since SEEK_* support has been there since the initial ZFS import on FBSD, AFAICT. So the failure issue on FBSD may be a different facet or a different problem entirely, but it may well wind up requiring a separate fix.
I believe it was the latter failure mode, yes, not coreutils. But I didn't read all the test failures yet.
Perhaps simpler than using the fallback Linux codepath, would be to pretend that the file has no holes, which OpenZFS already does if dmu_offset_next
returns EBUSY
. This should work on all platforms, not just Linux. Perhaps something like the following untested patch:
From 34e9d06fb8ba2140c1156b60b5d7ce25649cc411 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 31 Oct 2021 19:08:55 -0700
Subject: [PATCH] Work around lseek+SEEK_HOLE bug
* module/zfs/zfs_vnops.c (zfs_holey_common):
For now, pretend that files never have holes.
This works around OpenZFS bug #11900.
---
module/zfs/zfs_vnops.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
index 9bd75c011..cbad5e641 100644
--- a/module/zfs/zfs_vnops.c
+++ b/module/zfs/zfs_vnops.c
@@ -100,7 +100,13 @@ zfs_holey_common(znode_t *zp, ulong_t cmd, loff_t *off)
else
hole = B_FALSE;
+ /* Work around OpenZFS bug 11900
+ <https://github.com/openzfs/zfs/issues/11900>. */
+# if 0
error = dmu_offset_next(ZTOZSB(zp)->z_os, zp->z_id, hole, &noff);
+# else
+ error = EBUSY;
+# endif
if (error == ESRCH)
return (SET_ERROR(ENXIO));
--
2.31.1
I've given a spin on that a whirl, and am testing it now locally while I let GH Actions push it through the basic test suite runs.
edited to note: reproduces great for my VM on a virtio-scsi disk backed by NVMe, problem goes away when I twiddle the first workaround from off to on. Second workaround TBD.
Line 121 of that spin doesn't look right to me. It says *off = file_sz;
but shouldn't that be if (hole) *off = file_sz;
?
It's true, which is why there's a followup commit changing it, though to my surprise, nothing failed so far from the test run without that fix.
to my surprise, nothing failed so far from the test run without that fix.
Yes, that puzzles me too. Perhaps the test run was misconfigured to use a patched coreutils (i.e., coreutils with a patch that disables coreutils's use of SEEK_DATA
)? That would mean the test would need to be redone.
Well, I'm talking about two types of testing here.
One is me running emerge in a loop with I promise it's broken[1] and not changing between runs coreutils. That one I have not tried with the aforementioned commit, just the fixed one.
The other is the test suite runs on Github. I'm more expecting them to tell me if there's something unconditionally assuming this should work in some other way than to outright fail unless I EOPNOTSUPP.
The latter are the ones I was a little surprised did not fail. As said, I haven't tried the former.
[1] - not to slander coreutils here I know it's not coreutils misbehaving
As a data point, in the process of trying to see when things went awry (assuming this wasn't always the case), I tried Linux all the way back to 4.9.100 and it still repros happily with OZFS git master.
Next experiment, with a kernel old enough, is jumping back in OZFS git history...
Some progress to report, I was able to identify a flaw in the lseek()
data/hole reporting. It turns out the code was not flushing any memory mapped file regions prior to detecting the next data/hole. I'm not sure if mmap(2)
was involved in any of the failures reported above, but if it was that could explain things.
@rincebrain if possible would you mind testing this 93e19c7b35b62c238839536fed0d0f23c0466136 WIP fix with the reproducer? The critical change is the modification in zfs_holey_common()
to start writeback on any dirty memory regions.
As fast as my testbed will rebuild, sure.
Preliminary results are that I can't reproduce the bug with Go using the new patch. Will keep running it though and try more.
Likewise preliminarily, it's built and installed Go 12 times without a mangled binary spotted, when usually it took 1-5 runs before it crashed and my record was 8...
Passed 22 runs without crashing.
After seeing you changed zfs_dmu_offset_next_sync
to default to 1, out of curiosity, I set it to 0, and it still failed to crash in 10 runs.
@behlendorf Looks like you found the issue, great! For programs that might need to work around this issue on older openzfs instances, do you have any suggestions for how best to distinguish working from non working? thanks
Thanks for testing this fix, I'll see about wrapping up a few loose ends and getting a PR opened today.
Yes, either setting for zfs_dmu_offset_next_sync
should be fine, I'm glad you tested both. I switched the default from 0
to 1
to test that holes were correctly reported even for dirty files. We can leave the default unchanged to preserve the existing behavior, or switch it if the performance hit isn't too bad. Did you happen to notice is there was a significant difference in how long the test took to run?
@pixelb once we have the fix merged to master we'll get it backported to previous releases. If you can do an msync
before checking for holes that would avoid the issue until it's fixed.
I'm not sure if
mmap(2)
was involved in any of the failures reported above, but if it was that could explain things.
Go's linker writes binaries using mmap'd files, so I think this would explain what @thesamesam was seeing.
I didn't time it, no. I can.
I reran it just to be sure, but here's what I got for the first line from time
from 10 runs each doing emerge go
on my VM with zfs_dmu_offset_next_sync=0 and 1, in that order:
266.24user 169.93system 2:08.31elapsed 339%CPU (0avgtext+0avgdata 905616maxresident)k
264.99user 172.41system 2:08.78elapsed 339%CPU (0avgtext+0avgdata 901584maxresident)k
265.31user 172.64system 2:09.59elapsed 337%CPU (0avgtext+0avgdata 896992maxresident)k
265.94user 174.37system 2:09.35elapsed 340%CPU (0avgtext+0avgdata 909688maxresident)k
264.74user 173.37system 2:10.33elapsed 336%CPU (0avgtext+0avgdata 911112maxresident)k
264.00user 174.28system 2:10.30elapsed 336%CPU (0avgtext+0avgdata 885208maxresident)k
265.28user 174.65system 2:09.34elapsed 340%CPU (0avgtext+0avgdata 920204maxresident)k
265.44user 176.09system 2:10.09elapsed 339%CPU (0avgtext+0avgdata 884280maxresident)k
265.80user 174.61system 2:09.75elapsed 339%CPU (0avgtext+0avgdata 896716maxresident)k
264.20user 172.86system 2:09.09elapsed 338%CPU (0avgtext+0avgdata 923608maxresident)k
265.63user 174.20system 2:09.88elapsed 338%CPU (0avgtext+0avgdata 904192maxresident)k
265.12user 172.38system 2:08.60elapsed 340%CPU (0avgtext+0avgdata 886440maxresident)k
265.28user 174.53system 2:09.83elapsed 338%CPU (0avgtext+0avgdata 916408maxresident)k
266.57user 175.39system 2:10.42elapsed 338%CPU (0avgtext+0avgdata 915536maxresident)k
266.55user 176.39system 2:10.41elapsed 339%CPU (0avgtext+0avgdata 903176maxresident)k
264.78user 171.82system 2:08.48elapsed 339%CPU (0avgtext+0avgdata 907724maxresident)k
266.99user 176.07system 2:09.90elapsed 341%CPU (0avgtext+0avgdata 915456maxresident)k
266.57user 173.26system 2:09.43elapsed 339%CPU (0avgtext+0avgdata 907820maxresident)k
264.94user 171.56system 2:09.42elapsed 337%CPU (0avgtext+0avgdata 900348maxresident)k
264.30user 175.13system 2:09.26elapsed 339%CPU (0avgtext+0avgdata 878324maxresident)k
So I guess very much not the bottleneck, at least on my setup?
(I'm also still pretty curious why this is only reproducible in some environments...)
(I'm also still pretty curious why this is only reproducible in some environments...)
Some folks mount /var/tmp/portage as tmpfs to speed up compilation. In those cases, the go linker would be mmaping files there, and tmpfs doesn't have this problem. Later, when merging the temporary directory into the root filesystem, portage detects that it's a cross-filesystem move, and makes a copy using ordinary techniques without mmap.
System information
Describe the problem you're observing
lseek
+SEEK_DATA
is randomly returning-ENXIO
.Code showing the problem: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/portage_tool/src/portage_util_file_copy_reflink_linux.c;l=145
I patched it with the following:
On a successful run this is what the strace looks like(depthcharge.elf):
When a failure happens this is what it looks like (netboot.elf):
man lseek
shows:Describe how to reproduce the problem
I don't have a simple way to reproduce this yet. It happens as part of a
portage
build. I have attached a full strace log. The problem shows up randomly, so running the same command multiple times will cause different files to fail to copy.Include any warning/errors/backtraces from the system logs
depthcharge-strace-patch1.log