openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

PAX: size overflow detected in function zil_itx_create #2505

Closed mrobbetts closed 8 years ago

mrobbetts commented 10 years ago

Since updating to 0.6.3 (also kernel 3.14.5-hardened on Gentoo) I am seeing some of these in my dmesg:

Jul 17 09:39:15 copper kernel: [126911.127543] PAX: size overflow detected in function zil_itx_create /var/tmp/portage/sys-fs/zfs-kmod-0.6.3/work/zfs-zfs-0.6.3/module/zfs/../../module/zfs/zil.c:1179 cicus.137_21 min, count: 2
Jul 17 09:39:15 copper kernel: [126911.127553] CPU: 0 PID: 11210 Comm: rsync Tainted: P           O 3.14.5-hardened-r2 #4
Jul 17 09:39:15 copper kernel: [126911.127557] Hardware name: HP ProLiant MicroServer, BIOS O41     07/29/2011
Jul 17 09:39:15 copper kernel: [126911.127560]  ffff88025f7c1180 ffffc90085a97c18 ffffffff817c065e ffff88041fc0db20
Jul 17 09:39:15 copper kernel: [126911.127565]  0000000000000048 ffffc90085a97c28 ffffffff8111e8b3 ffffc90085a97c48
Jul 17 09:39:15 copper kernel: [126911.127570]  ffffffffa031fdc6 ffff8802da6cf078 ffff88025f7c1180 ffffc90085a97c98
Jul 17 09:39:15 copper kernel: [126911.127574] Call Trace:
Jul 17 09:39:15 copper kernel: [126911.127585]  [<ffffffff817c065e>] dump_stack+0x46/0x5e
Jul 17 09:39:15 copper kernel: [126911.127592]  [<ffffffff8111e8b3>] report_size_overflow+0x24/0x2e
Jul 17 09:39:15 copper kernel: [126911.127617]  [<ffffffffa031fdc6>] zil_itx_create+0x48/0x9d [zfs]
Jul 17 09:39:15 copper kernel: [126911.127639]  [<ffffffffa030c2b7>] zfs_log_remove+0x80/0xca [zfs]
Jul 17 09:39:15 copper kernel: [126911.127658]  [<ffffffffa0315ffe>] zfs_remove+0x311/0x3b5 [zfs]
Jul 17 09:39:15 copper kernel: [126911.127663]  [<ffffffff817c7b2a>] ? _raw_spin_lock+0x9/0x11
Jul 17 09:39:15 copper kernel: [126911.127680]  [<ffffffffa0329192>] zpl_fallocate_common+0x485/0x5f9 [zfs]
Jul 17 09:39:15 copper kernel: [126911.127685]  [<ffffffff81126db4>] vfs_unlink+0x92/0xeb
Jul 17 09:39:15 copper kernel: [126911.127689]  [<ffffffff81126f5e>] do_unlinkat+0x151/0x265
Jul 17 09:39:15 copper kernel: [126911.127695]  [<ffffffff811292c7>] SyS_unlink+0x11/0x19
Jul 17 09:39:15 copper kernel: [126911.127699]  [<ffffffff817ce89e>] system_call_fastpath+0x16/0x1b

This is obviously a 'hardened' (64-bit) kernel with PaX enabled. I saw it most recently when I tried to eix-sync my Gentoo installation (with /usr/portage/ stored on a zfs file system). That errored out with:

...
receiving incremental file list
rsync: writefd_unbuffered failed to write 8 bytes to message fd [receiver]: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(1532) [receiver=3.0.9]

Is this a problem with ZoL, or my configuration?

behlendorf commented 10 years ago

@mrobbetts PAX may be reporting a legitimate issue here. However, the code it's complaining about looks reasonable to me and I don't immediately see an overflow. Perhaps @ryao can better interpret the PAX warning.

chrisrd commented 10 years ago

This looks like a reasonable explanation of the PAX stuff:

https://forums.grsecurity.net/viewtopic.php?f=7&t=3043

I also don't see how it could overflow, everything is a constant (sizeof() or offsetof()) apart from the strlen(name), and even if name somehow isn't null terminated there's no way there won't be a null byte somewhere within size_t bytes:

void
zfs_log_remove(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
        znode_t *dzp, char *name, uint64_t foid)
{
   size_t namesize = strlen(name) + 1;
   itx = zil_itx_create(txtype, sizeof (*lr) + namesize);
}

itx_t *
zil_itx_create(uint64_t txtype, size_t lrsize)
{       
   lrsize = P2ROUNDUP_TYPED(lrsize, sizeof (uint64_t), size_t);
   itx = kmem_alloc(offsetof(itx_t, itx_lr) + lrsize,
      KM_PUSHPAGE | KM_NODEBUG);
}

Perhaps the P2ROUNDUP_TYPED macro is confusing the overflow detector due to sign extensions or something?

#define P2ROUNDUP_TYPED(x, align, type) \
        (-(-(type)(x) & -(type)(align)))
ryao commented 10 years ago

@behlendorf I would need to reproduce it locally. I will not have time to do that until the weekend at the earliest.

chrisrd commented 10 years ago

Looks like a PAX and/or gcc (4.7.2) optimisation bug.

A test program exhibits the PAX error when compiled w/ -O, and has no error without -O:

$ gcc -fplugin=linux/tools/gcc/size_overflow_plugin/size_overflow_plugin.so test.c -o test -O 2> /dev/null && ./test
SIZE_OVERFLOW: size overflow detected in function main test.c:25 cicus.6_14 min, count: 2

$ gcc -fplugin=linux/tools/gcc/size_overflow_plugin/size_overflow_plugin.so test.c -o test 2> /dev/null && ./test
exit ok

The test program, based on the examples in https://forums.grsecurity.net/viewtopic.php?f=7&t=3043 :

#include <stdint.h>
#include <stdio.h>

extern void *malloc(size_t size) __attribute__((size_overflow(1)));

void * __attribute__((size_overflow(1))) coolmalloc(size_t size)
{
        return malloc(size);
}

void report_size_overflow(const char *file, unsigned int line, const char *func, const char *ssa_name)
{
        printf("SIZE_OVERFLOW: size overflow detected in function %s %s:%u %s", func, file, line, ssa_name);
        fflush(stdout);
        _exit(1);
}

#define P2ROUNDUP_TYPED(x, align, type) \
        (-(-(type)(x) & -(type)(align)))

int main(int argc, char *argv[])
{
  size_t namesize = strlen(argv[0]);
  namesize = P2ROUNDUP_TYPED(namesize, sizeof (uint64_t), size_t);
  coolmalloc(20 + namesize);
}

Briefly, to build the size_overflow_plugin.so gcc plugin:

git checkout v3.14.12
git checkout -b 3.14.12-pax
wget http://grsecurity.net/stable/grsecurity-3.0-3.14.12-201407170638.patch
patch -p1 < grsecurity-3.0-3.14.12-201407170638.patch
make oldconfig  # enable the PAX stuff, in particular CONFIG_PAX_SIZE_OVERFLOW
make  # can kill it once it's done with tools/gcc/size_overflow_plugin/
chrisrd commented 10 years ago

FYI, reported to the PAX/grsecurity development forum:

User, code, gcc or grsecurity error? https://forums.grsecurity.net/viewtopic.php?f=1&t=4016

...no response as yet.

chrisrd commented 10 years ago

The response from "PaX Team":

this is a sort of false positive vs. 'bad' code as it relies on integer overflow to produce the desired results (e.g., the unary minus operator makes no sense on an unsigned type since the resulting negative value cannot be represented in the unsigned type and it's only the C integer conversion rules that make this work but that doesn't make the code any more readable/nice/etc). in situations like this we recommend rewriting the 'smart' code to a form that doesn't rely on overflow.

I.e. it's a limitation (or maybe a deliberate design choice) of the PAX overflow detector rather than a problem in the ZoL code (unless you consider poor code clarity a bug).

mrobbetts commented 10 years ago

@chrisrd Thanks for looking into this for me.

Would you recommend simply disabling PaX for use with ZoL, then? I've done that in the mean time, and everything appears to be working correctly.

chrisrd commented 10 years ago

@mrobbetts I guess it depends on how strongly you feel about using a PaX hardened kernel. Disabling PaX would be easiest. Selectively turning off PaX for the ZoL compile, if possible, could be an option. @ryao likely has a better idea than myself.

Overall, I'm not sure what the best long term solution would be. As pointed out by "PaX Team" in the grsecurity link above, there's an argument the P2ROUNDUP_TYPED and related macros should be modified to not do "tricky stuff" (or at least not do tricky stuff that trips up grsecurity). On the other hand, the "tricky stuff" is using standards-defined C behaviour so there's an argument this is a bug in grsecurity's overflow detector and should be fixed there. On the third hand, "Pax Team" has provided an alternative implementation of P2ROUNDUP_TYPED (similar work would probably need to be done for the related macros) which likely won't affect performance to any measurable degree so the issue can be addressed in ZoL.

In the end it's going to boil down to who cares the most: personally I don't use a PaX hardened kernel so it's not an issue for me. Someone on the PaX side using ZoL might be encouraged to look into fixing it in PaX, or someone on the ZoL side who's interested in using PaX might fix it here (@ryao, hint, hint :-) ).

rennhak commented 9 years ago

What is the best way to solve this in the meantime? Fallback to 0.6.2?

rennhak commented 9 years ago

A build with 0.6.2 and Linux 3.16.5 and PaX/Grsec fails with

Makefile:973: recipe for target 'fs' failed
make[2]: *** [fs] Error 2
rennhak commented 9 years ago

Information for anyone stumbling over this. 0.6.2 does not work on 3.16.5 or later. For that you have to use 0.6.3 or later.

gcs-github commented 9 years ago

As a sysadmin, I'm of the opinion that we should be able to use PaX with ZFS. I administer machines using ZFS with machines using PaX, and I think it's unfortunate that we'd have to choose between the added reliability of ZFS and the added security of PaX.

How can I help?

chrisrd commented 9 years ago

@gcs-github If you have C experience, I'd recommend looking at implementing the alternative P2ROUNDUP_TYPED and associated macros as recommended by PaX Team upthread.

Or, possibly even better, see if the linux source has any "round up" macros/functions that can be used instead - I'd be pretty surprised if it doesn't!

Of some related interest, the issue of detecting integer overflows is being discussed on LKML:

http://thread.gmane.org/gmane.linux.kernel/1838832

gcs-github commented 9 years ago

@chrisrd Sure, I'll look into it! :)

That'll be the first time for me getting inside the code of ZFS (or even kernel-related software in general). Do you have any pointers to set up a proper dev environment and the proper test suite for it?

chrisrd commented 9 years ago

@gcs-github Sorry, not really. Prior to actually testing inside the kernel (which would be better done in a virtual machine) you should use ztest: see cmd/ztest in the ZFS repository. I'd be looking for existing stuff in the linux source before writing anything, but if it's necessary to actually write code, I'd be writing an external test harness to exercise the rounding and interaction with PaX before dropping it into ZFS.

N8Fear commented 9 years ago

Just as a pointer: there is the #define round_up(x, y) (and round_down) macro in include/linux/kernel.h. This macro is used in the grsecurity patch (just greped for it: it actually was there before, but wasn't replaced by something different), so I guess it's implementation should be pax/grsec-compatible.

alexxy commented 9 years ago

I get same issue

zfs is 0.6.3 with patches (gentoo version by ryao)

uname -a

Linux mdstor 3.17.6-hardened #2 SMP Thu Dec 11 14:15:29 MSK 2014 x86_64 Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz GenuineIntel GNU/Linux

LANG=C gcc -v

Using built-in specs. COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.3/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.3/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/include/g++-v4 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.3/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 4.8.3 p1.1, pie-0.5.9' --enable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --with-multilib-list=m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libquadmath --enable-lto --without-cloog --disable-libsanitizer Thread model: posix gcc version 4.8.3 (Gentoo Hardened 4.8.3 p1.1, pie-0.5.9)

[ 59.975090] PAX: size overflow detected in function zil_itx_create /var/tmp/portage/sys-fs/zfs-kmod-0.6.3-r1/work/zfs-zfs-0.6.3/module/zfs/../../module/zfs/zil.c:1187 cicus.150_20 min, count: 2 [ 59.975095] CPU: 12 PID: 3252 Comm: touch Tainted: P O 3.17.6-hardened #2 [ 59.975096] Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014 [ 59.975098] ffff88105002f900 ffffc90031ebbac0 ffffffffa5ae1c35 0000000000000070 [ 59.975101] ffffc90031ebbad0 ffffffffa521d424 ffffc90031ebbaf0 ffffffffc08fe435 [ 59.975104] 0000000000000000 ffff88105002f900 ffffc90031ebbb60 ffffffffc08e68d8 [ 59.975106] Call Trace: [ 59.975117] [] dump_stack+0x45/0x5c [ 59.975123] [] report_size_overflow+0x24/0x30 [ 59.975136] [] zil_itx_create+0xa5/0xb0 [zfs] [ 59.975147] [] zfs_log_create+0xd8/0x3c0 [zfs] [ 59.975157] [] zfs_create+0x61a/0x6f0 [zfs] [ 59.975164] [] zpl_vap_init+0x578/0x930 [zfs] [ 59.975170] [] ? security_inode_permission+0x24/0x40 [ 59.975174] [] vfs_create+0xa2/0xd0 [ 59.975177] [] do_last.isra.37+0x1041/0x1200 [ 59.975180] [] ? security_file_alloc+0x1e/0x30 [ 59.975182] [] path_openat+0xb4/0x640 [ 59.975186] [] ? handle_mm_fault+0x74e/0xef0 [ 59.975189] [] ? mmap_region+0x193/0x6b0 [ 59.975192] [] do_filp_open+0x35/0x90 [ 59.975195] [] ? __alloc_fd+0x7b/0x130 [ 59.975198] [] do_sys_open+0x124/0x290 [ 59.975200] [] SyS_open+0x19/0x30 [ 59.975203] [] tracesys+0xcc/0xd1

shizmob commented 9 years ago

In case someone would like to test, I have an (untested!) potential fix in https://github.com/Shizmob/spl/commit/32f4ceca6f977121b639fb2f89111792177746a3. Any input would be appreciated.

chrisrd commented 9 years ago

HI @Shizmob , your P2ROUNDUP_TYPED() ignores the type argument: why is this safe (if it's safe, why do they have P2ROUNDUP_TYPED() vs P2ROUNDUP() in the first place)?

shizmob commented 9 years ago

I believe it's safe because, looking at the Linux implementation of round_up, it uses GCC's __typeof__ extension to receive the type to cast to: http://lxr.free-electrons.com/source/include/linux/kernel.h?v=3.2#L54. Presumably Solaris doesn't want to use this nonstandard extension, so they defined different APIs.

chrisrd commented 9 years ago

However your P2ROUNDUP_TYPED() will always use the type of the first argument, x, whereas the original uses an arbitrary(?) type passed in as the third argument: that's an interface change. Are you able to confirm the new version produces precisely the same result as the old usage in all relevant circumstances (even for unusual architectures and compilers)?

shizmob commented 9 years ago

I checked all usages of P2ROUNDUP_TYPED() in zfs and spl, and the type argument seems to match x's type in all cases except for one (modules/zfs/zil.c:1058), and I'm fairly sure the type difference does not matter there (4096 fits just fine in an int, the C specification mandates int to be at least two bytes wide).

As far as unusual compilers go, I'm not quite sure what you mean seeing as Linux only compiles with gcc and icc in the first place, to my knowledge. However, as noted above, a compiler/architecture(?) would have to be violating the C specification for this to break.

Also of note is that the change I made to spl has to be duplicated in libspl in the zfs repository.

chrisrd commented 9 years ago

Disclosure: I'm no guru on the intricate details of C, and I have no standing within the ZoL community, so I'm speaking as an interested bystander rather than from a position of knowledge and/or power! :-)

My concern is that the two implementations of P2ROUNDUP_TYPED() should guaranteed (e.g. by referring to C standards) to be equivalent in all circumstances (barring actual compiler bugs).

Whilst all current usages may be confirmed to be safe, we don't want to be hit by some subtle difference between the "normal" version of P2ROUNDUP_TYPED() and the ZoL version, e.g. when importing patches from other versions of OpenZFS or running on unusual architectures.

My concern is that different type arguments could potentially change the rounded value, and/or have differing results depending on the type of x and the type of the left hand side of the expression, due to sign extensions and whatnot. E.g.:

int foo = P2ROUNDUP_TYPED(x, align, unsigned long long);
unsigned long long bar = P2ROUNDUP_TYPED(x, align, int);
behlendorf commented 9 years ago

I agree this macro must behave consistently or it may accidentally result in some very hard to find issues. If there's any doubt whatsoever my suggestion would be to add a new test case to verify they behave the same.

perfinion commented 9 years ago

I just booted @Shizmob 's patch on my laptop and it appears to work fine. I did ebuild spl... unpack then edited sysmacro.h then ebuild.... merge and also edited the one in libspl in zfs-kmod.

before i applied shizmob's patch, I built the kernel with PAX_SIZE_OVERFLOW=y and it would consistently die early during boot. when I built with the patch, it booted fine and have had no issues yet when I ran some preliminary tests with a lot of disk activity. A scrub just finished with no errors either. I realize this isnt 100% proof but at least this direction seems to work.

I understand that changing macro might be undesirable since the linux/kernel.h one does not use type. Perhaps a better way would be to copy the linux one into spl and replace the typeof(x) with the type argument instead?

shizmob commented 9 years ago

@chrisrd - I certainly understand your point in making sure the macro will behave the same for all future patchsets. In that case I would go with what @perfinion proposed and just use the Linux implementation of rounding up, but modified for P2ROUNDUP and friend's needs. That should still behave exactly like the original and not trigger PaX's overflow detector. I'll update my fork later today so it can be tested.

shizmob commented 9 years ago

Based on @perfinion's feedback I've just copied and modified the Linux definitions now - it's in https://github.com/Shizmob/spl/commit/79a82a0fd97765687da6afab6c89945527743e1a. Feedback appreciated. Note again, that such a change would also have to be made in the zfs main repo's headers (/lib/libspl/include/sys/sysmacros.h).

perfinion commented 9 years ago

@Shizmob's new patch with the "type" argument used directly works great so far. i put the patches in epatch_user and installed. Also of note is that P2ROUNDUP_TYPED is defined twice in libspl in zfs-kmod. It is harmless because they are both exactly the same but it should be removed when applying this patch.

chrisrd commented 9 years ago

@Shizmob - The original P2ROUNDUP_TYPED() uses explicit casts on both it's x and align parameters. Your new version in Shizmob/spl@79a82a0 only casts the align parameter. With reference to the C type promotion rules, is this guaranteed to be sufficient to return the same result as the original in every possible circumstance (or every circumstance that is remotely plausible in the context of ZFS)?

Testing like @perfinion's is required (thanks!), e.g. to catch silly typos and thinkos, but there's no guarantee the testing has exercised every existing usage of P2ROUNDUP_TYPED(), let alone any future usage and oddball architectures. So, once again, I'm looking for an understanding of exactly how the C type promotion (and demotion) rules affect this macro.

chrisrd commented 9 years ago

@Shizmob - Also, your new version of the macro will probably trip the PaX overflow detector if the x or type parameters are unsigned and the x parameter is zero, due to (x) - 1. (This is also mentioned in the grsecurity link above.) Are there any guarantees in the ZFS code that x won't be zero? Or, if it is zero, is that an indication of an error or filesystem corruption etc. that we want the overflow detector to catch?

shizmob commented 9 years ago

Hmm, you're totally correct. I think this patch needs some more consideration, so I'm closing the PRs until I figure out something better.

perfinion commented 9 years ago

I have been running this patch for a while anyway on my laptop. I just ran into this overflow as well so it is clearly not enough. It is in a different function but also a size overflow issue. I was using a ZVOL as the disk in a VM and mkfs.ext4 hung during formatting and this error popped up.

Mar 15 02:30:54 meriadoc kernel: [29900.040964] PAX: size overflow detected in function dmu_read /var/tmp/portage/sys-fs/zfs-kmod-0.6.3-r1/work/zfs-zfs-0.6.3/module/zfs/../../module/zfs/dmu.c:780 cicus.366_143 max, count: 15 Mar 15 02:30:54 meriadoc kernel: [29900.040970] CPU: 3 PID: 1787 Comm: zvol/9 Tainted: P O 3.17.7-hardened-r1 #2 Mar 15 02:30:54 meriadoc kernel: [29900.040972] Hardware name: LENOVO 20AQ006HUS/20AQ006HUS, BIOS GJET80WW (2.30 ) 10/20/2014 Mar 15 02:30:54 meriadoc kernel: [29900.040974] ffffffff81e900c7 0000000000000000 ffffffffa035fd60 ffff8800cd88bb68 Mar 15 02:30:54 meriadoc kernel: [29900.040977] ffffffff81a52deb ffffffffa0370b6e ffff8800cd88bb98 ffffffff811f3576 Mar 15 02:30:54 meriadoc kernel: [29900.040980] 0000000000002000 000000000000fc00 0000000000000000 ffffc900239c3238 Mar 15 02:30:54 meriadoc kernel: [29900.040984] Call Trace: Mar 15 02:30:54 meriadoc kernel: [29900.041007] [] ? zvol_replay_vector+0x1fc0/0x1b940 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041011] [] dump_stack+0x4e/0x7a Mar 15 02:30:54 meriadoc kernel: [29900.041024] [] ? zvol_replay_vector+0x12dce/0x1b940 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041028] [] report_size_overflow+0x36/0x40 Mar 15 02:30:54 meriadoc kernel: [29900.041041] [] dmu_read+0x51d/0x950 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041045] [] ? avl_add+0x34/0x50 [zavl] Mar 15 02:30:54 meriadoc kernel: [29900.041061] [] zrl_is_locked+0x7c1/0x1cd0 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041077] [] zil_clean+0x9d2/0xc00 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041091] [] zil_commit+0x21/0x30 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041091] [] zil_commit+0x21/0x30 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041104] [] zrl_is_locked+0xcd1/0x1cd0 [zfs] Mar 15 02:30:54 meriadoc kernel: [29900.041108] [] ? __schedule+0x52b/0xbc0 Mar 15 02:30:54 meriadoc kernel: [29900.041114] [] taskq_cancel_id+0x2a6/0x5b0 [spl] Mar 15 02:30:54 meriadoc kernel: [29900.041118] [] ? wake_up_state+0x20/0x20 Mar 15 02:30:54 meriadoc kernel: [29900.041123] [] ? taskq_cancel_id+0x110/0x5b0 [spl] Mar 15 02:30:54 meriadoc kernel: [29900.041126] [] kthread+0xc4/0xe0 Mar 15 02:30:54 meriadoc kernel: [29900.041130] [] ? kthread_create_on_node+0x170/0x170 Mar 15 02:30:54 meriadoc kernel: [29900.041133] [] ret_from_fork+0x74/0xa0 Mar 15 02:30:54 meriadoc kernel: [29900.041136] [] ? kthread_create_on_node+0x170/0x170

chrisrd commented 9 years ago

So, somewhere here:

 int
dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
    void *buf, uint32_t flags)
{
...
              for (i = 0; i < numbufs; i++) {
                        int tocpy;
                        int bufoff;     
                        dmu_buf_t *db = dbp[i];

                        ASSERT(size > 0);

>>>>>
                        bufoff = offset - db->db_offset;
                        tocpy = (int)MIN(db->db_size - bufoff, size);

                        bcopy((char *)db->db_data + bufoff, buf, tocpy);
<<<<<
                        offset += tocpy;
                        size -= tocpy;  
                        buf = (char *)buf + tocpy;
                }

The PaX warning is probably fair enough, given tocpy and bufoff are ints and the only non-uint64_t variables involved, and there's nothing in this immediate code that guarantees the calculations won't overflow an int. (Although limits elsewhere almost certainly provide this guarantee.)

To avoid the warning I guess simply changing tocpy and bufoff to uint64_t might be sufficient. However then you're still left with the third argument to bcopy needing to be an int and potentially overflowing.

I think, once again, someone with an interest in making ZoL PaX compatible needs to contact them to find out their preferred method for dealing with this situation.

hydrosIII commented 9 years ago

Hi, Im tryiing to run zfs and pax together using grsecurity in debian Jessie.

An acceptable workaround to this issue maybe the following: Build zfs module using normal kernel, then reboot to switch to the new kernel, with grsecurity.

Or, set up a build server to construct kmod debian packages, and install them in the new kernel without building them directly using a PAX enabled kernel. Would this be acceptable?

perfinion commented 9 years ago

@hydrosIII no, i think the modules have to link to the exact kernel in question so you cant really build for one kernel and use in another. That being said, now there has been a new tag I think its time I properly revisit this and try and get it merged, this issue has been open for quite a while now.

behlendorf commented 9 years ago

@perfinion it would be great if you could make some time to propose a clean fix for this. I'd love to resolve this issue in the next tag. It has been with us for too long.

@hydrosIII what your suggesting can be done with weak-modules support on enterprise Linux kernels which provide a stable kABI. But for faster moving distributions it's not possible because we don't know in advance what symbols are available.

perfinion commented 9 years ago

I went through the source and found all the places that use the P2ROUNDUP macro to see what kind of issues it might run into.

Original macros:

#define P2ROUNDUP_TYPED(x, align, type) (-(-(type)(x) & -(type)(align)))

All usages are completely unsigned, so the -x doesnt entirely make sense

module/zfs/zil.c:1061: wsz = P2ROUNDUP_TYPED(lwb->lwb_nused, ZIL_MIN_BLKSZ, uint64_t);
    uint64_t = int, 4096ULL, uint64_t
module/zfs/zil.c:1105: dlen = P2ROUNDUP_TYPED(lrw->lr_length, sizeof(uint64_t), uint64_t);
    uint64_t = uint64_t, size_t, uint64_t
module/zfs/zil.c:1194: lrsize = P2ROUNDUP_TYPED(lrsize, sizeof(uint64_t), size_t);
    size_t = size_t, size_t, size_t
module/zfs/sa.c:503: blocksize = P2ROUNDUP_TYPED(size, SPA_MINBLOCKSIZE, uint32_t);
    uint32_t = uint32_t, ULL, uint32_t
module/zfs/zio_checksum.c:169: size = P2ROUNDUP_TYPED(zilc->zc_nused, ZIL_MIN_BLKSZ, uint64_t)
    uint64_t = uint64_t, 4096ULL, uint64_t
module/zfs/zio_checksum.c:225: size = P2ROUNDUP_TYPED(nused, ZIL_MIN_BLKSZ, uint64_t);
    uint64_t = uint64_t, 4096ULL, uint64_t

#define P2ROUNDUP(x, align) (-(-(x) & -(align)))

Lots of places use P2ROUNDUP() but a lot of these are using other #define constants so im not sure they should really be un-typed. It seems safer if all of them were just cast to use typeof(x)

Proposed new macro: Requirement is that x is never 0. otherwise the x-1 will underflow which the PaX plugin will again catch. Since these are all sizes to read or lengths they shouldnt really be 0 so pretty sure if there is a 0 then it is a real error. If x is allowed to be 0 then we need to add a little to the macro.

Linux has the following which takes care of the typing already:

#define __round_mask(x, y) ((__typeof__(x))((y)-1))
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)

Is there ever a time when we want to rely on the type of the mask? if not then just merging both into the same macro seems like a reasonable option.

#define P2ROUNDUP(x, align)    ((((x) - 1) | ((align) - 1)) + 1)
#define P2ROUNDUP_TYPED(x, align, type)    ((type)((x)-1) | (type)((y)-1))+1)
If we want P2ROUNDUP to just use the original type, then we could do:
#define P2ROUNDUP(x, align)    P2ROUNDUP_TYPED(x, align, __typeof__(x))

I am going to look through the other macros too before I prepare a proper pull request since a couple of the others look like they also have similar issues. Thoughts so far?

Also, this isnt a build issue, the gcc plugin is used at buildtime but these issues trigger during runtime when built with a PaX kernel

chrisrd commented 9 years ago

@perfinion keep in mind my upthread comment about my standing in the ZoL community, but to my mind changing these macros to make them PaX compatible requires a very good understanding of the sign extension and type promotion rules according to the C standards. I don't have that understanding myself, but your comment "All usages are completely unsigned, so the -x doesnt entirely make sense" makes me think you should be looking into this much further. Likewise, regarding "Is there ever a time when we want to rely on the type of the mask?", my preference would be to be referencing C standards about why the type of the mask doesn't matter.

perfinion commented 9 years ago

@chrisrd so for P2ROUNDUP_TYPED, there are no uses of it that are signed, so sign extension does not matter. The others it might matter i'll double check.

And by "All usages are completely unsigned, so the -x doesnt entirely make sense" I meant that taking the negative of an unsigned number doesnt entirely make logical sense. In C it ends up the equivalent of (~x+1) but if thats what is wanted, then better to just do that since it "makes sense" for both signed and unsigned.

And about "Is there ever a time when we want to rely on the type of the mask?" I did not mean in regard to the macros. I meant are there places in the code that use it relying on the type of the mask? Because from what I could see, there were several places that used the untyped macro but I think they should instead be using the typed one. An integer constant without any qualifiers is signed in C, and is used in many places where the other type is unsigned which seems like an error. casting to the type of x seems more correct. If it specifically needs the type of the mask then it is much clearer to use the _TYPED version

The issue is that currently there are many instances eg P2ROUNDUP(size, 8), that is technically P2ROUNDUP(size, (signed int)8), but having a negative size is mostly likely a bug.

chrisrd commented 9 years ago

@perfinion It seems my usage of "sign extension" was an error, it should have been something like "negation of unsigned quantities" - thanks for pointing that out. My point was that the negation of an unsigned quantity is a well specified operation in C and I was concerned you hadn't reached that level of understanding. I now understand your comment was more that, in the real world, negating an unsigned quantity doesn't make a lot of sense - which I agree with. (On the other hand, we're not the real world here, we're in "C" world: here be monsters aplenty!)

Overall, my concern is that any alternate form we come up with for these macros should be guaranteed, by the C specification, to behave identically to the original macros (apart from not triggering the PaX overflow detection of course!), otherwise we risk having subtly different behaviour to the other ZFS implementations which could be exceedingly difficult to track down.

I'm more than willing to accept that the current macros, and/or their usages, may actually be "wrong" in some way (e.g. your example of P2ROUNDUP(size, 8)). But if we come up with a better way of doing things which isn't guaranteed to be equivalent to the existing implementation, we need to get buyin from "upstream" (currently Illumos, AFAIUI) to avoid getting bitten by a C monster.

perfinion commented 9 years ago

okay @chrisrd, I finally figured out how to prove they are equivalent. dump this in test.c, there are no includes or anything.

// Set the type everywhere, try signed/unsigned and int/long/long long
#define thetype unsigned int

thetype old(thetype x, thetype align)
{
    // The original SPL macro
    return -(-(x) & -(align));
}

thetype negate(thetype x, thetype align)
{
    // -x == twos complement negation which is equivalent to ~x+1
    return ~(-(x) & -(align)) + 1;
}

thetype demorgan(thetype x, thetype align)
{
    // Demorgans law: ~(A & B) ==> ~A | ~B
    return ((~(-x)) | (~(-align))) + 1;
}

thetype new(thetype x, thetype align)
{
    // The new macro that does not trigger the overflow
    return ((((x) - 1) | ((align) - 1)) + 1);
}

int main(void)
{
    old((thetype)1234, (thetype)8);
    negate((thetype)1234, (thetype)8);
    demorgan((thetype)1234, (thetype)8);
    new((thetype)1234, (thetype)8);
    return 0;
}

now compile and dump the assembly. the stack options are because my hardened compiler defaults to enabling them and the assembly is more annoying to read with them enabled. Also play around with -O[0-3] and -Os. They all work the same but give slightly different outputs.

$ gcc -O1 -ggdb -fstack-check=no -fno-stack-protector test.c -o test
$ gdb --batch -ex "disas old" -ex "disas negate" -ex "disas demorgan" -ex "disas new" test
Dump of assembler code for function old:
   0x0000000000000760 <+0>: mov    %edi,%eax
   0x0000000000000762 <+2>: neg    %eax
   0x0000000000000764 <+4>: neg    %esi
   0x0000000000000766 <+6>: and    %esi,%eax
   0x0000000000000768 <+8>: neg    %eax
   0x000000000000076a <+10>:    retq   
End of assembler dump.
Dump of assembler code for function negate:
   0x000000000000076b <+0>: mov    %edi,%eax
   0x000000000000076d <+2>: neg    %eax
   0x000000000000076f <+4>:     neg    %esi
   0x0000000000000771 <+6>: and    %esi,%eax
   0x0000000000000773 <+8>: neg    %eax
   0x0000000000000775 <+10>:    retq   
End of assembler dump.
Dump of assembler code for function demorgan:
   0x0000000000000776 <+0>: lea    -0x1(%rdi),%eax
   0x0000000000000779 <+3>: sub    $0x1,%esi
   0x000000000000077c <+6>: or     %esi,%eax
   0x000000000000077e <+8>: add    $0x1,%eax
   0x0000000000000781 <+11>:    retq   
End of assembler dump.
Dump of assembler code for function new:
   0x0000000000000782 <+0>: lea    -0x1(%rdi),%eax
   0x0000000000000785 <+3>: sub    $0x1,%esi
   0x0000000000000788 <+6>: or     %esi,%eax
   0x000000000000078a <+8>: add    $0x1,%eax
   0x000000000000078d <+11>:    retq   
End of assembler dump.

So from old->negate, the twos complement is absolutely no change to the generated asm. negate->demorgan is just demorgans law which is fine too. Then from demorgan->new there is no difference in the generated asm so they are also the same. I think this should suffice that the new and old are equivalent. do try it on other compilers and see if we get the same. I tried clang too and it is exactly the same.

chrisrd commented 9 years ago

@perfinion Looks good to me! I still have one question though: with the new implementation, for x = 0, is there a difference in behaviour compared to a non-PaX system?

perfinion commented 8 years ago

both the new and old are the same for x = 0; old: 0 & anything == 0; and new: 0 -1 | anything + 1= -1 + 1 so they both end up just returning zero no matter what the align arg is. This will still trigger a pax size overflow for the new one but I think that is good because zero is almost definitely wrong.

ryao commented 8 years ago

@chrisrd The two are equivalent for all inputs. I sketched out a mathematical proof of equivalence after hearing @perfinion's explanation in IRC earlier today:

Here are axioms:

a. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement
b. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law
c. ~(~x) === x under the law of the excluded middle

Here is a proof:

0. (-(-(x) & -(align)))
1. (~(-(x) & -(align)) + 1) by a
2. (((~(-(x))) | (~(-(align)))) + 1) by b
3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by a
4. (((((x) - 1)) | (((align) - 1))) + 1) by c

Then if we just clean up the unnecessarily parenthesis, we arrive at @perfinion's code.

https://github.com/zfsonlinux/zfs/pull/3949#issuecomment-151180695

chrisrd commented 8 years ago

@ryao Yes, I saw your https://github.com/zfsonlinux/zfs/pull/3949#issuecomment-151180695 and appreciate the formalisation of @perfinion's logic at https://github.com/zfsonlinux/zfs/issues/2505#issuecomment-149042687.

However the mathematical proof doesn't take into account C's type promotion rules: i.e. for the different sequence of operations (at the C compiler level), do the C type promotion rules guarantee the new macro ends up with the same result as the original?

On the other hand I've basically come around to the view that, whilst insisting on guarantees of equivalence per the C spec might be the technically correct thing to do, we're just as likely (which is not very likely at all) to see differences in behaviour in the original macro between platforms due to differences between compilers and/or optimisation settings etc. I.e. it's not worth worrying too much about the exquisite details of the type promotion rules as long as we're comfortably sure the behaviour of the two macros is the same for all cases we care about.

I also have a slight concern regarding triggering the PaX overflow for the x = 0 case which would be a change in behaviour from non-PaX ZoL, per my comment at https://github.com/zfsonlinux/zfs/issues/2505#issuecomment-68317341:

Are there any guarantees in the ZFS code that x won't be zero? Or, if it is zero, is that an indication of an error or filesystem corruption etc. that we want the overflow detector to catch?

...but, like @perfinion, I'm reasonably sure the x = 0 case is going to mean something has already gone terribly wrong and triggering the PaX overflow isn't going to make things worse and may well help avoid filesystem damage further down the track. If we really cared about the x = 0 behaviour differences then the grsecurity thread previously referenced has a solution that only adds a few CPU cycles.

I wonder what the chances are of getting the updated macro into OpenZFS?

ryao commented 8 years ago

@chrisrd Good point about the type coercion rules not being included in the proof. One way to formally decide whether the type coercion rules pose a problem would seem to be to look at each of the axioms in the proof and answer the question "does the axiom break under C's type coercion rules?". If the answer is no for all of them, then the proof holds for type coercion. If the answer is yes for any of them, then the proof breaks.

The only axiom where I suspect anything might be different is the one from two's complement where the answer depends on how the compiler does coercion with the constant. If the constant is automatically coerced to the type of the variable, we are fine because whatever coercion that follows would be the same as it would have been between x and align under the original. If the constant is an int, which is the default in C, then we have two cases. Either we coerce between an unsigned integral type and a signed integral type, and/or we coerce between types of different lengths. In the former case, the coercion should be unsigned. In the latter case, the coercion should go to the larger width. The former case should be fine while the latter case poses a problem if the macro is used in a context where it is being assigned to a smaller bit width where we would coerce from a smaller width to a larger width and generate a warning. This would become an error with -Werror, but assuming that does not occur, the final result would otherwise be the same.

It should be said that I am assuming that type coercion is the same regardless of how expr1 op expr2 occurs in the compiler's abstraction syntax tree. For all possible op, wehther we have expr1 be type A and expr2 be type B or expr1 b type B and expr2 be type A, I assume that the result under C's type coercion rules would be the same. I believe this is a fair assumption, although for true formalism, this needs to be checked. I should also admit that I am doing some handwaving here WRT to the other axioms. My reasoning is that the & and | operators are equivalent in terms of type coercion while the ~ operator should have no effect on type coercion. These are also things that need to be checked for a truly formal proof.

Those caveats aside, I believe that the correct answer on equivalence when type coercion is taken into consideration should be how coercion of the width of the constant is handled in a compiler. I am inclined to think that a compiler would adopt the width of the first coercion that occurs since the 1 can safely fit in any integral type because that is how I would see myself handling this if I wrote a compiler front end (I actually have, but it was not a C front end and it just a college project). This needs some investigation, but even if the C specification provides favorable behavior, it is still possible for there to be compiler bugs in implementing it.

As for getting the updated macro into OpenZFS, I think the chances are good following a discussion about upstreaming this that I had yesterday in #illumos on the freenode IRC network.

ryao commented 8 years ago

Given that the only risk of non-equivalence should be a compiler warning, we could just assume it is equivalent unless the compiler warns otherwise. I could try formalizing my informal reasoning on C type coercion. It would be an interesting exercise for learning how to apply things that I learned in school, but blocking the merge of the patch on complete formal verification is overkill, so it should be fine to merge without that.

@behlendorf Do you see anything else that we need to check?

chrisrd commented 8 years ago

@ryao In sort, I agree this is fine to go in.

However I'm not sure what you're thinking about there: the "non-equivalence" of what objects, and what would the compiler would be warning about?

I don't think looking at the type promotion (whether coerced or automatic) involved in each step of the mathematical proof is going to be much use to us: the C compiler won't be following the proof to transform from the first macro to the second as the proof does, the compiler will be following whatever wild and woolly path the compiler writer came up, almost certainly whilst 3 days sleepless and keeping going by sheer willpower and ingestion of copious mind altering substances (the end results are generally indistinguishable from those produced by a wizard of optimisation), to get to the end point of rounding up the value.

The best we can hope for, for either macro, is the end result is in accordance with how a sane mind would interpret the rules of C.

That's basically why I'm now more relaxed about demanding guarantees of equivalence between the macros (i.e. macro to end result, not macro to macro): it would be nice if someone were able to rigorously show what should happen per the C type promotion rules for all possible types (we already have the mathematical side covered), but that would still not guarantee the compiler will actually follow the rules itself. (Sure, if we found the compiler wasn't following the rules we could try get it fixed, but that doesn't help in the short to medium term.) I think the best we can do is ensure the new macro is sane and actually works as desired in the cases we care about now. The esoteric and currently non-existing cases I was previously expressing concern about (odd new types and/or strange new compilers being used) can be dealt with if and when they come about.

ryao commented 8 years ago

@chrisrd Is your concern the situation where GCC simultaneously miscompiles the new code and fails to issue a warning? I am not concerned about that possibility with the new macro anymore than I am concerned about that possibility with the old macro. While miscompilation is always a matter of concern, that deserves its own issue and a separate resolution in the form of a switch to a formally verified compiler such as the CompCert C compiler:

http://compcert.inria.fr/

That said, I consider the biggest risk to be that GCC will evaluate the expression formed by the macro to have a different integral type than the original in its Abstract Syntax Tree. If that occurs, the compiler will overzealously increase the bit width of the variable rather than coercing the bit width of the constant and/or will treat the constant as a signed type. Provided that the GCC follows the rules of C, the latter is not a problem while the former would only cause a problem should the width increase too much upon assignment. That should cause GCC to generate a compiler warning about an integral type being truncated upon assignment to an lval. This should occur in "semantic analysis" as shown in this diagram:

http://jcsites.juniata.edu/faculty/rhodes/lt/images/ccover4.gif

chrisrd commented 8 years ago

@ryao (apologies for the downtime...) my concern was that the new macro should produce the same end results as the old macro in all possible cases, according to the rules of C (including type promotion (TP) and expression evaluation ordering (EEO) etc.). What I was looking for was a proof along the lines of "ok, the original macro works like this, according to the rules of C (including TP and EEO), and the new macro works like this other, again according to the rules of C, and hence we see that the two produce the same results for all possible cases (assuming the compiler is strictly conforming)".

I'm somewhat more relaxed about wanting that rigorous approach now that I'm more familiar with the C type promotion rules (thanks to this issue) and the ways the macro is used in current code. I.e. I'm comfortable the new macro will work for current usage and reasonable future usage, and I think any future breakage (i.e. differences in output of the old and new macros) due to my hypothetical "odd new types and/or strange new compilers being used" are just as (un!)likely as any future breakage due to compiler bugs etc.