nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.58k stars 133 forks source link

cloud_init: fix errors when loaded before acquiring IP address #1992

Closed francescolavra closed 7 months ago

francescolavra commented 7 months ago

Attempting to connect to a cloud_init server before an IP address is acquired via DHCP results in a connection error, which causes the cloud_init klib to fail to initialize (and thus the VM to be stopped without executing the user program). This is causing sporadic "program startup failed before exec: (result:connect failed (-4))" errors to occur when running the cloud_init e2e test as part of the Jenkins CI tests.

This change fixes the above issue by adding a check for a suitable IP address configuration before attempting to connect to a server or resolve its host name, and retrying connecting at a later time if this check fails. The code to retry a connection when the DNS query function returns ERR_INPROGRESS has been removed, and a call to cloud_download_connect() has been added in the DNS callback function, because otherwise a large delay in the DNS resolution process may cause many DNS requests to be pending at the same time, which can result in a DNS query error (and subsequent klib initialization failure) due to unavailable free request slots. The code that checks for ERR_VAL (which had been added to retry the connection if no IP address has been acquired) has been removed because it no longer works (now the kernel sets up a default DNS server during initialization, therefore the ERR_VAL value is never returned). The "exec_wait_for_ip4_secs" manifest flag has been removed because it does not serve its intended purpose to delay klib initialization (this flag can only delay execution of the user program).