traderepublic / Cilicon

🛠️ Self-Hosted ephemeral macOS CI on Apple Silicon
MIT License
969 stars 29 forks source link

Improve download logic for OCI images #31

Closed ast3150 closed 3 months ago

ast3150 commented 1 year ago

Cilicon currently uses URLSession.bytes to download the OCI images.

Problems

This approach has some problems:

Possible solutions

I've investigated some approaches how this could be handled differently. So far the most promising approach I've seen would involve switching to the AsyncHTTPClient library which offers FileDownloadDelegate. This would address the high memory load during downloads by streaming the download directly to the disc using SwiftNIO for non-blocking I/O.

Secondly, the downloads could be split into different files, called chunks. This would mean the downloader can check whether a certain chunk already exists on-disk before starting the download, which would mean that an interrupted download could be continued relatively easily by only re-downloading any chunks which are not yet completed.

ast3150 commented 1 year ago

I made some progress on this, but there are a few loose ends to tie up before this is ready to merge.

Marcocanc commented 1 year ago

Thanks for working on this. We should definitely benchmark this to see what the impact is. Currently the image slices are being decompressed as they are downloaded and never stored on disk (although they probably are through SWAP), so decompressing after all slices have downloaded could potentially increase overall time. It would be great to see at which downlink speed this is more effective than the original approach.

ast3150 commented 1 year ago

For me the biggest speed difference is that the full internet speed is taken advantage of. Before this change I would see download speeds of 1-10 MB/s, with this change it's around 50 MB/s. IIRC the unzipping works fast, certainly not noticeably worse than before.

But the big improvement is in downloads being resumable. Personally, with the existing logic I downloaded until 99.6% before running out of space during the unzip, so had to download everything again. [hide-the-pain-harold]

Marcocanc commented 1 year ago

Regarding the last point, we could implement a check before starting the download as we know the image size in advance (might even be able to get the decompressed size from the manifest).

ast3150 commented 1 year ago

Fyi @Marcocanc I completed the todos for the above PR so you could review it if you have some time. Another PR for custom GitLab runner configuration incoming 🔜

Sherlouk commented 6 months ago

I'm not sure what speeds others are seeing, but downloading the Sonoma Xcode 15.1 OCI listed by Cirrus on a 1.2Gbps network connection took close to 3 hours. Using Activity Monitor I saw network usage predominately at 1-2MB/s. This is using latest version of Cilicon.

Really quite painful.

Marcocanc commented 6 months ago

Hi @Sherlouk, could you test the download speed when using Tart and then compare it with Cilicon? From my experience speeds on Github's CDN can vary quite significantly depending on how saturated they are. I was personally downloading at much faster speeds on my 1Gbps line.

Sherlouk commented 6 months ago

Sure thing. Pulling the same image using tart CLI took 19 minutes. Of course there's so many variables at play with GitHub CDN saturation and such.

Marcocanc commented 3 months ago

I'll be closing this issue, as the current download logic seems to work fine (depending on GH availability). If your results are consistently bad compared to tart or the implementation in this PR, feel free to reopen.