rauc / rauc-hawkbit-updater

The RAUC hawkBit updater is a simple commandline tool/daemon that runs on your target and interfaces between RAUC and hawkBit's DDI API.
https://rauc-hawkbit-updater.readthedocs.io
GNU Lesser General Public License v2.1
58 stars 36 forks source link

Download loop when resume_downloads and missing bundle #158

Open ashlin4010 opened 1 year ago

ashlin4010 commented 1 year ago

If for some reason the bundle from Hawkbit is no longer available eg returns 404 rauc-hawkbit-updater will try re-download the files forever. Forcing the rollout to stop via Hawkbit does not stop this behavior, the only way is to stop and restart the rauc-hawkbit-updater service.

I believe that resume_downloads should only resume if there is any hope for a successful download. HTTP status codes 4xx and 5xx should be aborted and report a failure back to Hawkbit.

I believe that force-stopping an update should also stop resume_downloads. The addition of a limit may also be wise.

Our hope was that resume_downloads would allow us to reduce data usage in the event of a data outage mid-download. However, the risk of getting stuck in an endless loop of trying to download is too high and has the possibility to use an unlimited amount of data. There is also the risk of a self-inflicted DDoS attack, a fleet of systems continually trying to download a nonexistent file forever without any way to stop them would not be not ideal.

I encountered this problem while setting up a test production environment and had some reverse proxies misconfigured and Hawkbit was doing dumb things. However, an infrastructure outage may also have this effect. For example, a Hawkbit container may fail but any load balancer may keep going sending 503 error codes.

Bastian-Krause commented 1 year ago

If for some reason the bundle from Hawkbit is no longer available eg returns 404 rauc-hawkbit-updater will try re-download the files forever. Forcing the rollout to stop via Hawkbit does not stop this behavior, the only way is to stop and restart the rauc-hawkbit-updater service.

I believe that resume_downloads should only resume if there is any hope for a successful download. HTTP status codes 4xx and 5xx should be aborted and report a failure back to Hawkbit.

I agree. I guess this should be easy to implement: after checking for resumable_codes, set resumable to FALSE if the error domain equals RHU_HAWKBIT_CLIENT_HTTP_ERROR and 400 <= error->code < 600. PR welcome.

I believe that force-stopping an update should also stop resume_downloads.

That would require polling some REST endpoint during download. How is this propagated to the DDI API?

The addition of a limit may also be wise.

This should already be covered by low_speed_time and low_speed_rate, right?

Our hope was that resume_downloads would allow us to reduce data usage in the event of a data outage mid-download. However, the risk of getting stuck in an endless loop of trying to download is too high and has the possibility to use an unlimited amount of data. There is also the risk of a self-inflicted DDoS attack, a fleet of systems continually trying to download a nonexistent file forever without any way to stop them would not be not ideal.

I encountered this problem while setting up a test production environment and had some reverse proxies misconfigured and Hawkbit was doing dumb things. However, an infrastructure outage may also have this effect. For example, a Hawkbit container may fail but any load balancer may keep going sending 503 error codes.

Right.