zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.59k stars 6.48k forks source link

DHCP client does not notice missing link #7553

Closed therealprof closed 6 years ago

therealprof commented 6 years ago

The current DHCP implementation seems not to be able to detect link changes (on my Nucleo-F429ZI board). After pulling the cable the interface status remains bound:

shell> net iface

Interface 0x2000a140 (Ethernet) [0]
===================================
Link addr : 00:80:E1:3A:A5:26
MTU       : 1500
IPv4 unicast addresses (max 5):
        10.0.0.107 DHCP preferred
IPv4 multicast addresses (max 5):
        <none>
IPv4 gateway : 10.0.0.5
IPv4 netmask : 255.255.255.0
DHCPv4 lease time : 259200
DHCPv4 renew time : 0
DHCPv4 server     : 10.0.0.5
DHCPv4 requested  : 10.0.0.107
DHCPv4 state      : bound
DHCPv4 attempts   : 1

I would expect that the interface recognizes its link state and attempts a RENEW of the lease (if not already expired) as soon as the link is reestablished. This ensures that the network has not changed and the IP address is still available.

tbursztyka commented 6 years ago

it's more like an ethernet subsystem issue. There is an eth net mgmt event about carrier on/off, but drivers don't implement it yet, so it's never raised, and no code it listening to that event (like dhcp for instance.)

locomuco commented 6 years ago

following PR is also blocked by that https://github.com/zephyrproject-rtos/zephyr/pull/561

@tbursztyka what would be the appropriate solution, a link change mgmnt event?

tbursztyka commented 6 years ago

@locomuco see my comments, there is already an event for that on ethernet side (device can be up, but no carrier found. On other bearer is either up/down, and these signals also exist)

jukkar commented 6 years ago

@therealprof @locomuco Proposal for a fix at #8027, please test if possible.

jukkar commented 6 years ago

Note that currently the fix is implemented for frdm-k64f board. I do not have nucleo board to fix the stm ethernet driver and test this.

therealprof commented 6 years ago

@jukkar It works on my Nucleo-F429ZI board if I manually shut down the interface using the shell and bring it up again.

pfalcon commented 6 years ago

I would expect that the interface recognizes its link state and attempts a RENEW of the lease (if not already expired) as soon as the link is reestablished.

I wonder if there're any references that this is a correct and recommend behavior. From just a common sense, I wouldn't expect that unplugging/plugging cable should affect DHCP functioning in any way.

pfalcon commented 6 years ago

The generic DHCP RFC, https://tools.ietf.org/html/rfc2131, expectedly doesn't mention the case of RENEWing on medium status change, instead it only has:

A client MAY choose to renew or extend its lease prior to T1. The server may choose not to extend the lease (as a policy decision by the network administrator)

From quick googling, I can't find confirmation that renewing on medium up is universally implemented practice.

Overall, I guess there're 2 cases here:

  1. Interface is down when DHCP client starts (or when lease is expired). In this case, it does make sense to listen to interface up event and start DHCP request immediately on it. Because the alternative on relying on DHCP retry timer (5 min suggested?) may lead to poor user experience.
  2. RENEWing a non-expired lease on medium up seems to be a purely optional, policy feature.

I obviously not arguing against implementing this, just trying to understand what is being implemented, why, and what prioritization it may have.

therealprof commented 6 years ago

@pfalcon Not sure it is "correct" but I think it is implemented in any major operating system; I have definitely seen it in Linux and I've just verified that macOS does it and I recall that Windows up to version 7 or so did not do it but Microsoft has implemented it at a later point.

From my POV it is an important feature that a device is capable of doing this since every link change potentially means that the device might be connected to a different network so just communicating with previous settings is bad behaviour, at the very minimum a duplicate address detection should happen but I really think DHCPv4, DHCV6 and/or SLAAC are definitely in order.

therealprof commented 6 years ago

@pfalcon NB: I'm actually relying on that feature quite heavily at work with a lot of different networks at the same time (e.g. test network, production network and local LANs and management networks of various routers and switches).

jukkar commented 6 years ago

The renewing works just fine after #8027 so we can just use the PR. IMHO it makes sense to renew asap after cable is connected.

pfalcon commented 6 years ago

since every link change potentially means that the device might be connected to a different network

That's a valid usecase, thanks. But I wonder if RENEW covers it well enough though - for example RENEW is sent to a particular DHCP server, what if there's no reply? I don't see that https://tools.ietf.org/html/rfc2131 says it's an error, it only says:

In both RENEWING and REBINDING states, if the client receives no response to its DHCPREQUEST message, the client SHOULD wait one-half of the remaining time until T2 (in RENEWING state) and one-half of the remaining lease time (in REBINDING state), down to a minimum of 60 seconds, before retransmitting the DHCPREQUEST message.

So I wonder, if we would need to add even more adhoc handling of RENEW-after-link-up than #8027 does. Anyway, with that usecase, it's definitely a step in the right direction, leaving details to you guys, thanks for the discussion.

therealprof commented 6 years ago

@pfalcon Maybe not quite the reference you were looking for but certainly related: https://www.rfc-editor.org/rfc/rfc4957.txt