Open rincebrain opened 2 years ago
Oh good, past me dug into this already. #12214
Well, I dug into why it crashes using ZFS_DEBUG="watch", but it turns out the reason it crashes there is not the reason it dies so often in the CI. (The reason it crashes there is that it winds up with a buffer that's still only PROT_READ
not also PROT_WRITE
, which is a neat trick, since it's marked compressed, so I think it should never be frozen...still looking into where the "right" place to adjust that is, since nominally this shouldn't ever arise...)
Time to go down a rabbit hole again.
Based on the valgrind output it looks to me like the flaw here is in arc_read_done()
. In the event that the read fails the the abd pages may contain uninitialized data, but we unconditionally run the BP_PROTECTED(bp) block which accesses those pages. It seems to me we should only be doing this when io_error == 0
.
This only causes a crash in ztest
since in the kernel those pages will be mapped and contain something. In user space we'll be doing the I/O with vdev_file_io_start()
/ vdev_file_io_done()
which rely on umem
/ posix_memalign()
which won't initialize anything (and valgrind rightly complains about that). There's probably a reasonable argument to be made that we should be zero filling the pages on a read error just to be safe.
I'm still not quite sure why setting zfs_abd_scatter_min_size
to 4097 prevents this issue. Perhaps when allocating anything less than a page (for a linear abd) posix_memalign()
internally does initialize the buffer.
Huh, so you think the initial take on it was correct?
ISTR that PR not stopping the crash, though, which was why I closed it...
On Mon, Nov 29, 2021 at 3:27 PM Brian Behlendorf @.***> wrote:
Based on the valgrind output it looks to me like the flaw here is in arc_read_done(). In the event that the read fails the the abd pages may contain uninitialized data, but we unconditionally run the BP_PROTECTED(bp) block https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c#L5679-L5697 which accesses those pages. It seems to me we should only be doing this when io_error == 0.
This only causes a crash in ztest since in the kernel those pages will be mapped and contain something. In user space we'll be doing the I/O with vdev_file_io_start() / vdev_file_io_done() which rely on umem / posix_memalign() which won't initialize anything (and valgrind rightly complains about that). There's probably a reasonable argument to be made that we should be zero filling the pages on a read error just to be safe.
I'm still not quite sure why setting zfs_abd_scatter_min_size to 4097 prevents this issue. Perhaps when allocating anything less than a page (for a linear abd) posix_memalign() internally does initialize the buffer.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12793#issuecomment-981992921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7LVURBNYFIFSRCWVC3UOPO3XANCNFSM5IZ5ELVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
It sure does look like a real problem. But based on your testing perhaps not the root cause of this segfault...
I'll go rerun the experiment just to be sure; as I said in #12214, I don't think the change is incorrect, so merging it would be fine if you agree that it's a real issue that should be avoided, it just didn't appear to solve this when I most recently tested.
Having glanced at the recent zloop failures quickly, it looked like the common failure modes were a ztest interaction with indirect vdevs that might be specific to the vdev properties PR, here, and a weird failure with writing I don't immediately understand, here; neither of which seems to be from this.
Now I'm trying to recall if I got here because I wanted to use ZFS_DEBUG="watch" for something else and hit this, or hit a crash not using it and then got this backtrace using it...hm.
(...maybe I should script perusing the zloop test logs...)
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
ztest crashes an awful lot.
Most of the crashes, IME, look something like:
After a round of bisecting, I ended up at 87c25d56, which I would not have guessed, but here we are. And lo, if you extend ztest to set zfs_abd_scatter_min_size to 4097 on x86_64, it goes from crashing practically always to crashing never so far.
If we ask valgrind, first it complains a lot about uninitialized values in the crypto code being read a bunch, but if you zero those, it becomes limited to eventually spitting out:
Obviously we could just...make ztest do that for now, but that seems problematic, and it's not presently clear to me whether the logical flaw is in the umem implementations of things or elsewhere? (Will continue looking, of course, but.)
Describe how to reproduce the problem
Above.
Include any warning/errors/backtraces from the system logs
Above.