Closed Voxel07 closed 2 years ago
Thanks for reporting the issue. The first problem I see is DO agent is continuing to retry the download even after writing to storage fails.
Looks like there is no space left on the storage device In the log below, C0D0001C indicates 0x1c == error 28 (ENOSPC / No space left on device /).
2021-08-26T12:03:19.8169592Z 1181 1184 trace {TraceDownloadStatus} id: 6b0b97dd-e372-4142-bc83-2fd238398644, 1, codes: [200, 0x0, 0x0], 25034752 / 120944697
2021-08-26T12:03:20.7769686Z 1181 1197 error {Append} (hr:80070018) cbWritten != static_cast
2021-08-26T12:03:22.8874063Z 1181 1197 error {Append} (hr:C0D0001C) HRESULT_FROM_XPLAT_SYSERR(errno) [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/util/do_file.cpp, 51] 2021-08-26T12:03:22.8875678Z 1181 1197 error {OnData} (hr:C0D0001C) DO failure: (null) (hr:0xC0D0001C) [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/util/do_file.cpp, 51], {HRESULT_FROM_XPLAT_SYSERR(errno)} [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/download/download.cpp, 629] 2021-08-26T12:03:22.8875971Z 1181 1197 error {operator()} (hr:C0D0001C) hr [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/util/http_agent.cpp, 289] 2021-08-26T12:03:22.8878183Z 1181 1197 error {operator()} (hr:C0D0001C) DO failure: (null) (hr:0xC0D0001C) [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/util/http_agent.cpp, 289], {hr} [/usr/src/debug/deliveryoptimization-agent/1.0+gitAUTOINC+3f00d1e0f8-r0/git/client-lite/src/util/http_agent.cpp, 252] 2021-08-26T12:03:22.8878538Z 1181 1197 warning {operator()} (hr:C0D0001C) Url: http://deviceupdateinstance--schneider-device-update.b.nlu.dl.adu.microsoft.com/northeurope/deviceupdateinstance--schneider-device-update/6ecc8a8fb83b4160bd833ac8eb68ad74/rauc_update_nand-V20211012.artifact, host: deviceupdateinstance--schneider-device-update.b.nlu.dl.adu.microsoft.com 2021-08-26T12:03:22.8884733Z 1181 1184 info {IsConnected} Network connectivity detected. Interface: eth0, address family: 2 (AF_INET). 2021-08-26T12:03:22.8885582Z 1181 1184 info {operator()} (hr:C0D0001C) 6b0b97dd-e372-4142-bc83-2fd238398644, failure, will retry in 4 seconds, http_status: 206, headers:
I haven't looked into the source code of DO-Agent very much, because the main focus was an DU-Agent and DO was just working, until I stumbled apron this Problem during testing.
Does DO-Agent reserve Memory somewhere else than the work folder that's given to him by DU-Agent? Because the /tmp/ folder is cleared before every download and "top" states more than 500m RAM left. So that shouldn't be the problem. An important note is that the rootfs of my device is ro, and I have specified some folders like the logging folders for DU and DO to be writable.
Large writes will be done only for downloading the file to the work folder provided by DU agent. Other than that, DO agent writes to /var/log/ (or /var/cache/ in an older version) and /var/run/ but both these locations should see writes <1MB in size.
Thanks, then I haven't overlooked something. But I still don't get what part of my memory is running low that I can't see. Is there something that is allocated and doesn't get freed if the agent gets killed?
Insufficient disk space will now cause download to pause and report a fatal error.
When the deployment is restarted after a field installation, the agent sometimes can't download the file and runs out of memory.
This happened before, when the DU-Agent was restarted during the download phase. This was fixed by removing old sandboxes and restarting do-service before starting the new download. The download can now be aborted and restarted as many times as wanted. But if DU-Agent isn't restarted after a failed Download, something goes wrong with DO-Agent
The Problem is, that this isn´t happening all the time. Sometimes the download succeeds many times in a row and then crashes again randomly. If I run top alongside, then there is always memory left.
DU-LOG
DO-LOG