jpirko / libteam

team netdevice library
GNU Lesser General Public License v2.1
225 stars 57 forks source link

Fix teamd crash on lw_psr_port_removed() ops #17

Closed wipawel closed 8 years ago

wipawel commented 8 years ago

In case of e.g. arp_ping (or nsna_ping) link watch used when adding a port dynamically (i.e. through PortAdd DBus method for instance) teamd crashes on lw_psr_port_removed() ops call, when there was no previous corresponding lw_ap_port_added() call.

That happens when teamd_link_watch_arp_ping object port_priv's .init call did not happen due to some preceding error, whereas .fini is called immediately following the error flow of port_obj_create().

Calls:

port_priv_change_handler_func()
  port_obj_create()
    teamd_event_port_added()
    port_priv_init_all() <- some initial .init fails here, before
                            port_prov's init. lw_ap_port_added()
                            remains uncalled and psr_ppriv->ops
                            unset (NULL).
    port_obj_destroy()
      port_priv_fini_all() <- segfault, due to uninitialized ops.

First step fix would be to check for initialization of psr_ppriv->ops, indicating that corresponding .init call has happened before.

Signed-off-by: Pawel Wieczorkiewicz pwieczorkiewicz@suse.de

This situation happens easily with activebackup runner, when team device instance is started with no ports, and following PortAdd method calls fail for example due to inability to change MAC addresses according to hwaddr_policy specific to activebackup runner. The error here is a result of inability to change MAC address of a device whose link is already up. That happens because of the way PortAdd is handled i.e. call to add a port causes to immediately enslave the port (link is set up on it), then following netlink events caused that way are triggering registered callbacks where change MAC address routine resides among them e.g. ab_hwaddr_policy_same_all_port_added(). Afterwards when teamd_set_hwaddr() fails with EBUSY (that error code is mapped from corresponding result received via rtnl) the whole PortAdd method fails and port_obj_create() cleans half-initialized port object. BTW: Attempt to change MAC address after device is enslaved and has link up is worth fixing on its own.

Call stack:

(gdb) where
#0  0x000000000040c98d in lw_psr_port_removed (ctx=0x1ed93c0, tdport=,  priv=0x1ef8760, creator_priv=) at teamd_lw_psr.c:187
#1  0x000000000040b785 in port_priv_fini_all (port_obj=0x1ef7ee0, ctx=0x1ed93c0) at teamd_per_port.c:129
#2  port_obj_destroy (ctx=ctx@entry=0x1ed93c0, port_obj=port_obj@entry=0x1ef7ee0) at teamd_per_port.c:176
#3  0x000000000040ba58 in port_obj_create (team_port=0x1ee0af0, ifindex=,  p_port_obj=, ctx=0x1ed93c0) at teamd_per_port.c:206
#4  port_priv_change_handler_func (th=0x1eda790, priv=0x1ed93c0, type_mask=) at teamd_per_port.c:260
#5  0x00007fbb1e532cf5 in check_call_change_handlers (th=th@entry=0x1eda790,  call_type_mask=call_type_mask@entry=1) at libteam.c:233
#6  0x00007fbb1e534b1d in get_port_list (th=0x1eda790) at ports.c:198
#7  port_list_init (th=0x1eda790) at ports.c:216
#8  0x00007fbb1e533655 in team_refresh (th=0x1eda790) at libteam.c:704
#9  0x0000000000407a0a in teamd_init (ctx=0x1ed93c0) at teamd.c:1354
#10 teamd_start (p_ret=, ctx=0x1ed93c0) at teamd.c:1512
#11 main (argc=, argv=) at teamd.c:1749
wipawel commented 8 years ago

I have also submitted #18