thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
219 stars 54 forks source link

As a user I want partial downloads to be resumed after network failed #606

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 2 years ago

Partial downloads should resume after network failed

In case only part of a file has been downloaded, resume a partial download (HTTP Range header).

reubenmiller commented 1 year ago

This definitely makes sense especially when downloading larger binaries (>10MB - 1GB) on mobile network where both the connectivity could be unreliable, and there are usually data volume limitations.

Bravo555 commented 1 year ago

QA instructions

I need to say that it's possible that I'm not fully aware of all the downloader use-cases and the ticket is a bit vague, so perhaps I missed something in my own understanding or testing about what behaviour is expected, so if you think something was missed, then I encourage to explore that direction.

  1. Prepare a software package that will be downloaded via an HTTP URL. Note that we don't test installation, so it can be random bytes of any size. I used a Cumulocity software repository to add a test-very-large-software package, to which I've added two versions, one with a file upload to cumulocity tenant, and the second with the URL to my private nginx instance. Both are downloaded by the HTTP URL download codepath, so they work the same, but using the URL to a private server I control, I can see how client behaves when I change server behaviour, so I think it will be better for testing. Version that has file uploaded to the tenant is used in the Robot Framework test because I intend to delete a version with URL pointing to my VPS once the testing is complete. Make sure the server download is happening from returns Accept-Ranges: bytes header upon initial request.

  2. Use the Cumulocity Software control interface to install the software package prepared in the previous step on the device under test.

  3. Look at logs of tedge-agent and the currently running operation and ensure that it correctly logs that the download started

  4. Introduce some kind of failure while the download is running. Here I don't have a complete list of possible failure modes, but i did test a few general types:

    • server error: while the download was running, i closed the HTTP server, which resulted in underlying TCP connection getting closed but as the client knows it didn't receive the entire response, it correctly triggered a backoff mechanism. Here the server knows it's not able to continue and is explicit, so the client knows exactly what happned and is able to react accordingly
    • no network route: the download stalls and doesn't hit any kind of timeout, in tedge or in the OS, and sometimes it starts back up after the route is available again, and sometimes it doesn't. The same thing is happening with curl and wget, so this is probably fine?
    • network interface disabled on client or server: the same thing as when no route - the download stalls and doesn't hit any timeout. I'd anticipate that closing a network interface would also close the socket, but it seems it remains open.
  5. Decide whether the observed behaviour is a bug or a feature: As previously stated, some tools like curl or wget seemingly not handle some failures, or perhaps it's just my Linux knowledge lacking. In any case, "doing the same thing as curl or wget" would be satisfactory, I think.

gligorisaev commented 1 year ago

I'm pleased to announce that the QA process for this feature/issue has been successfully completed and here are the results:

Overall, I am confident that this feature/issue is ready for the next steps.