opnsense / dhcp6c

OPNsense WIDE-DHCPv6 client
Other
22 stars 30 forks source link

Issues with load all interfaces patches #31

Closed jrmithdobbs closed 3 years ago

jrmithdobbs commented 3 years ago

I'm running into some issues I'm not sure the cause of when trying to use the opnsense patches to load all interfaces in the config instead of requiring them on the command line. It causes sigbus/segfaults on OpenBSD and I'm not sure why. I'm getting 0xdfdfdfdfdfdfdfdf as the pointer being passed into:

ifinit (ifname=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>)

From the initial loop in main. Which is usually a sign of a free after use or similar issue on OpenBSD's malloc/free implementation. I've been staring at this longer than I'd like to admit and I'm not really seeing why this is happening. I'm also confused as to why cfparse() needs to be called on either side of that interface name loading.

The patch also makes specifying interfaces on the command line only work until the next SIGHUP which I submitted #30 to fix. With #30 the code runs and works on OpenBSD so long as interfaces are passed on the command line. Without #30 when passing interfaces on the command line a sigbus/segfault will also happen on SIGHUP as if an interface was not specified on the command line at startup.

marjohn56 commented 3 years ago

We don't read the interfaces from the command line, we read them from the config file.

jrmithdobbs commented 3 years ago

Yes but the patches leave the option to pass them on the command line but that breaks after a hup.

The patches to not read them off of the command line also seem to have some kind of initialization bug as they segfault if I try to utilize them on openbsd instead of FreeBSD where if I run the code specifying the interfaces on the command line or remove those commits it does not segfault. (Until after HUP because that has the same seemingly problematic initialization code)

jrmithdobbs commented 3 years ago

This is a really long comment but I started writing it while still digging with a debugger and I think it's important to leave for documentation. tl;dr the way the opnsense changes implemented the usage of all configured interfaces seems to rely on UB that can do weird things if compilers optimize things out in certain ways on certain platforms. I'm actually surprised this works as is on FreeBSD with clang because it doesn't at all on OpenBSD with gcc OR clang.

Maybe output will make it clear.

Without an interface specified:

bash-4.4# cat /etc/dhcp6c-empty-test-pds.conf 
interface vether0 { send rapid-commit, ia-pd 0; };
id-assoc pd 0 { prefix ::/48 infinity infinity; };
bash-4.4# ./dhcp6c -fDnp /var/run/dhcp6c-test.pid -c /etc/dhcp6c-empty-test-pds.conf 
Apr/17/2021 09:22:50: get_duid: extracted an existing DUID from /var/db/dhcp6c_duid: <REMOVED DUID>
Apr/17/2021 09:22:50: cfdebug_print: <3>[interface] (9)
Apr/17/2021 09:22:50: cfdebug_print: <5>[vether0] (7)
Apr/17/2021 09:22:50: cfdebug_print: <3>begin of closure [{] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>[send] (4)
Apr/17/2021 09:22:50: cfdebug_print: <3>[rapid-commit] (12)
Apr/17/2021 09:22:50: cfdebug_print: <3>[,] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>[ia-pd] (5)
Apr/17/2021 09:22:50: cfdebug_print: <3>[0] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of closure [}] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>[id-assoc] (8)
Apr/17/2021 09:22:50: cfdebug_print: <13>[pd] (2)
Apr/17/2021 09:22:50: cfdebug_print: <13>[0] (1)
Apr/17/2021 09:22:50: cfdebug_print: <13>begin of closure [{] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>[prefix] (6)
Apr/17/2021 09:22:50: cfdebug_print: <3>[::] (2)
Apr/17/2021 09:22:50: cfdebug_print: <3>[/] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>[48] (2)
Apr/17/2021 09:22:50: cfdebug_print: <3>[infinity] (8)
Apr/17/2021 09:22:50: cfdebug_print: <3>[infinity] (8)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of closure [}] (1)
Apr/17/2021 09:22:50: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:22:50: configure_pool: called
Apr/17/2021 09:22:50: clear_poolconf: called
Bus error (core dumped)

With an interface specified:

bash-4.4# ./dhcp6c -fDnp /var/run/dhcp6c-test.pid -c /etc/dhcp6c-empty-test-pds.conf vether0
Apr/17/2021 09:24:23: get_duid: extracted an existing DUID from /var/db/dhcp6c_duid: <REMOVED DUID>
Apr/17/2021 09:24:23: cfdebug_print: <3>[interface] (9)
Apr/17/2021 09:24:23: cfdebug_print: <5>[vether0] (7)
Apr/17/2021 09:24:23: cfdebug_print: <3>begin of closure [{] (1)     
Apr/17/2021 09:24:23: cfdebug_print: <3>[send] (4)                                                       
Apr/17/2021 09:24:23: cfdebug_print: <3>[rapid-commit] (12)                            
Apr/17/2021 09:24:23: cfdebug_print: <3>[,] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>[ia-pd] (5)
Apr/17/2021 09:24:23: cfdebug_print: <3>[0] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of closure [}] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>[id-assoc] (8)
Apr/17/2021 09:24:23: cfdebug_print: <13>[pd] (2)
Apr/17/2021 09:24:23: cfdebug_print: <13>[0] (1)
Apr/17/2021 09:24:23: cfdebug_print: <13>begin of closure [{] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>[prefix] (6)
Apr/17/2021 09:24:23: cfdebug_print: <3>[::] (2)
Apr/17/2021 09:24:23: cfdebug_print: <3>[/] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>[48] (2)
Apr/17/2021 09:24:23: cfdebug_print: <3>[infinity] (8)
Apr/17/2021 09:24:23: cfdebug_print: <3>[infinity] (8)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of closure [}] (1)
Apr/17/2021 09:24:23: cfdebug_print: <3>end of sentence [;] (1)
Apr/17/2021 09:24:23: configure_pool: called
Apr/17/2021 09:24:23: clear_poolconf: called
Apr/17/2021 09:24:23: dhcp6_reset_timer: reset a timer on vether0, state=INIT, timeo=0, retrans=446
Apr/17/2021 09:24:24: client6_send: Sending Solicit
Apr/17/2021 09:24:24: client6_send: a new XID (51af3b) is generated
Apr/17/2021 09:24:24: copy_option: set client ID (len 14)
Apr/17/2021 09:24:24: copy_option: set rapid commit (len 0)
Apr/17/2021 09:24:24: copy_option: set elapsed time (len 2)
Apr/17/2021 09:24:24: copyout_option: set IA_PD prefix
Apr/17/2021 09:24:24: copyout_option: set IA_PD
Apr/17/2021 09:24:24: client6_send: send solicit to ff02::1:2%vether0
Apr/17/2021 09:24:24: dhcp6_reset_timer: reset a timer on vether0, state=SOLICIT, timeo=0, retrans=1006
<keeps retransmitting until killed since interface not connected to anything>

The backtrace from the core in the first example:

#0  strlen () at /usr/src/lib/libc/arch/amd64/string/strlen.S:152
152     /usr/src/lib/libc/arch/amd64/string/strlen.S: No such file or directory.
        in /usr/src/lib/libc/arch/amd64/string/strlen.S
(gdb) bt
#0  strlen () at /usr/src/lib/libc/arch/amd64/string/strlen.S:152
#1  0x00001f7a8c7db7c2 in _libc_strdup (str=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>)
    at /usr/src/lib/libc/string/strdup.c:44
#2  0x00001f778e3105fe in ifinit (ifname=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>) at if.c:73
#3  0x00001f778e303880 in main (argc=0, argv=0x7f7ffffd7268) at dhcp6c.c:210

Line 210 there is actually dhcp6c.c#L207 as some ifdefs unrelated are needed to build on OpenBSD.

If I set a breakpoint at that line it looks like it is a failure on the first element, the loop is not looping. It is not being initialized properly it seems? But that guard value is usually set after a free. Very curious.

So I set a breakpoint in configure_interface in config.c which is kind of hard to follow with -O2 as ifp gets optimized out/etc. When stepping through the loop the first found name gets configured and setup as expected and everything assigned to ifames looks good. For some reason (-O2 makes it annoying to figure out why) it does a second iteration of the loop then ends up doing a goto to out for some reason.

So I tried building with -O0. It works and that is super concerning.

But then I notice the exact same behavior build with clang instead of gcc so this is unlikely to be a codegen/optimization bug. 🙃😞

After some digging I think I've figured it out.

I'm not sure exactly what the UB is but the implementation as exists in these patches seems to be depending on some in how it's assigning the ifnames pointer from config.c. It seems on some systems (OpenBSD) user land the compiler is able to realize that the first invocation of cfparse() doesn't ever have it's values used and so inserts an out that frees everything but doesn't return -1 or something crazy along these lines.

I'm assuming it's because what's being passed to configure_interface is a unit/local static variable from the generate cfparse.c and OpenBSD has nice/weird malloc/free implementations that are more amenable to static analysis by the compiler?

If I move the ifinit() calls into configure_interface() in config.c the problem goes away as everything is setup before configure_commit() is called by cfparse.y. This is safe, doesn't depend on undefined behavior and actually simplifies the code both in main and the SIGHUP handler. I merged the fixes into #30 where I fixed just the SIGHUP oversight initially.

marjohn56 commented 3 years ago

Yes but the patches leave the option to pass them on the command line but that breaks after a hup.

The patches to not read them off of the command line also seem to have some kind of initialization bug as they segfault if I try to utilize them on openbsd instead of FreeBSD where if I run the code specifying the interfaces on the command line or remove those commits it does not segfault. (Until after HUP because that has the same seemingly problematic initialization code)

Not really interested in whether it works in OpenBSD, this fork is specifically for use by Opnsense on FreeBSD, that's why it's in the Opnsense repo and not in the Upstream FreeBSD repo or the OpenBSD repo. We have made a lot of changes to dhcp6c and continue to do so, they are specifically intended for Opnsense. Anyone is free to use our changes as it's totally open source, but it's use on any other system other than FreeBSD/HBSD Opnsense is not in our remit.

jrmithdobbs commented 3 years ago

but it's use on any other system other than FreeBSD/HBSD Opnsense is not in our remit.

I was just asking for help troubleshooting your changes in your fork. Not blaming you for anything or saying you're liable for it? I'm sorry if I somehow communicated this in a way that wasn't clear. 😞

I just stumbled across this trying to test code changes for adding the environment variables stuff requested in #19 because I too would like more exposed in the environment variables ... I just happened to test building/running it on an openbsd system as part of my workflow. 😜

The way it's assigning back to ifnames can cause the compiler to introduce a use after free if the malloc implementation in freebsd ever gets improved to a point where the compiler notices that you're calling cfparse without accessing it's results except by way of ifnames because by spec it doesn't have to check that usage. It's a subtle bug.

marjohn56 commented 3 years ago

As I said earlier. we do not use the command line interface options. We stopped using that some time ago, Pushing commits to this fork to fix something we are not going to use is not that useful. What we do not want are any changes that affect the way dhcp6c works for OpnSense. A comment made in there worries me also, 'This also adds a DUID variable so that different instances of dhcp6c canpass to the same script.' - There can be only ONE instance of dhcp6c, you cannot run multiple instances. You can have multiple WANs that dhcp6c can pull addresses from, but there can only be one instance.

marjohn56 commented 3 years ago

I appreciate your input in adding the multiple PD info, but really you should be testing this only on a FreeBSD system and if it's going to be used by Opnsense, then use Opnsense to test it. The idea of not using the command line interfaces is that we can HUP dhcp6c and re-read the config, getting the interfaces from there, which made the code in interfaces.inc cleaner as we did not need to worry about the interfaces when firing up dhcp6c to start with. Quite a few of the PR's either never make it beyond the commit or are there simply as ideas to test. Adding conditionals for OpenBSD compilation is, I would suspect, likely to have a negative response from @fichtner when he comes to look at the commit; I could be wrong though. ;)

jrmithdobbs commented 3 years ago

I just noticed it was broken if you passed an interface when testing because if I HUP'ed it on the openbsd machine it segfaulted and I ended up tracking back the problem to happening always when hitting that code path with the compiler introduced free after use on openbsd. #19 is an issue/user feature request not a pr

I was testing on a different machine because I didn't want to, you know, actually drop my v6 reservations on the internet while testing. 🙃

I just haven't removed the openbsd build stuff from that branch, I'll fix it. I've got an updated version after some testing. I got distracted because trying to figure out why it was segfaulting on openbsd but not freebsd was a weird/fun problem to troubleshoot.

jrmithdobbs commented 3 years ago

For posterity to answer my own questions in the OP for anyone else looking later:

1) It is a use after free but it’s one introduced by the compiler because of OpenBSD’s malloc impl allowing the compiler to make the (valid) determination to do so because of how ifnames is being set. 2) cfparse is being called, and so the config is being parsed, twice because configure_commit only saves parts of the config that map to an interface already stored in the dhcp6_if global. 3) I definitely spent too much time staring at this.

The changes in #30 cleans up 1&2 so that they don’t have to be reasoned about.

Changes in #29 make it easier to test this in a working copy since it makes SIGINT equivalent to SIGTERM so cleanup methods actually get called and stderr/stdout actually get flushed when using -f.

The changes in #28 are the whole reason I went down this rabbit hole.