systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.27k stars 3.8k forks source link

Lease getting expired without retrying at T2 #33934

Open khhizr007 opened 3 months ago

khhizr007 commented 3 months ago

systemd version the issue has been seen with

254

Used distribution

Ubuntu 22.04

Linux kernel version used

6.5.0-1022-aws

CPU architectures issue was seen on

x86_64

Component

systemd-networkd

Expected behaviour you didn't see

The lease renewal should take place at T2 rebinding time so that the address lease is extended.

Unexpected behaviour you saw

We have noticed from the past couple of weeks that are application running on aws ec2 goes down abruptly and we are no longer able to even SSH into our application.

On investigating our system logs we found that the issue is stemming from the lease expiration for the ip address that our instance leases from the dhcp server. This can be seen from the error messages as seen below:

systemd-networkd[328]: ens5: Could not set DHCPv4 address: Connection timed out

Also after a while we can see this error about the deletion of the lease in the logs:

Deleting interface #3 ens5, 172.o0.1.xx6#123, interface stats: received=4556, sent=6992, dropped=0, active_time=1045188 secs

This error is faced when a couple of crons are running in the background and the system is under stress. We have a high level of read operations taking place on our disk during this time when we face this error. Though this is not root cause of concerns in our system but more of a symptom of something going awry with our application, I still think this needs some addressal.

So getting back to my point, even if the lease was not able to be secured at the T1 time of renewal there should be rebinding attempt taking place at T2 time. But this is not seen and our application becomes unreachable for us and we need to do a manual reboot of it.

We have already tried switching our ubuntu versions but this is still not solved. There past issues as well here which have raised a similar concern about the expiration of lease.

Steps to reproduce the problem

Try having high IOPS operations running in background in your system at the time the lease is bound to be renewed through the dhcp server.

Additional program output to the terminal or log subsystem illustrating the issue

No response

yuwata commented 3 months ago
Deleting interface #3 ens5, 172.o0.1.xx6#123, interface stats: received=4556, sent=6992, dropped=0, active_time=1045188 secs

This is not from networkd. How do you get this?

khhizr007 commented 3 months ago

Sorry for the late reply @yuwata, this log is from the ntpd service. The other log that I shared is from systemd. I just wanted to provide a more clearer image for better understanding of the problem.

khhizr007 commented 2 months ago

Hi @yuwata , would you need any other logs/details for this. Would be more than happy to be of help.