oxalica / rust-overlay

Pure and reproducible nix overlay of binary distributed rust toolchains
MIT License
960 stars 57 forks source link

bound monotonically increasing size of this repo #160

Closed j-baker closed 4 months ago

j-baker commented 9 months ago

Every day, a file is added with today's latest nightly releases, etc.

This file is 20kB. What this means is that every month, this repo gets bigger by 600kB.

We use tooling to automatically update rust-overlay internally, and what this means is that each day, it's typical to have to download about 30MB of rust-overlay. In a year, that'll be 38MB.

I'm wondering if some reasonable garbage collection policy could be applied? For one example, could simply delete all nightly releases older than a year and therefore bound the repo size to 7MB (and possibly tag once a year so people can still access the old examples). A fancier version of this would be to keep all nightly releases for the last month, then 1/3 of nightly releases for the past 3 months, 1/15 nightly releases for the past year, 1/30 beyond that.

Alternatively, could maintain two branches - an archive branch which contains everything, and a 'current' branch which just contains latest nightly or some recent ones.

Lastly, one could nest rust-overlay, having it itself depend on some internal flakes (which represent the real release data) (e.g. one for recent nightly, one for nightly archive, one for stable). These are (hopefully!) lazily evaluated and so you only download the archive if you have to.

I'm not that opinionated - it just seems unfortunate that in 5 years we'll have 70MB of hashes you need to download in order to grab the latest release, and if you're regularly updating (or develop on numerous repos which are not kept in lockstep) you'll unnecessarily be downloading the same pile of resources over and over.

oxalica commented 9 months ago

I'm also aware of this but I'm not sure about the appropriate cut off time for nightly versions. My idea is to maybe keep only the recent ~3years so that it's at least the time period of an edition. Earlier versions can be cut off into a tag as an archive in some period. There's also a problem that, if we do this in a 3yrs period in sync with Rust editions, we will have at most 6yrs of data right before another edition release (like now before 2024 is out).

Lastly, one could nest rust-overlay, having it itself depend on some internal flakes (which represent the real release data) (e.g. one for recent nightly, one for nightly archive, one for stable). These are (hopefully!) lazily evaluated and so you only download the archive if you have to.

I don't think Nix supports this because flake inputs are fetched and locked before the evaluation. Also that operations on lock files like nix flake update and/or nix flake archive always downloads all inputs as a part of their semantics.

(e.g. one for recent nightly, one for nightly archive, one for stable)

The issue is almost all about nightly versions. We can easily contain all stable and beta versions without having repo size issues. Splitting out stable and beta versions can be done, and I already tried it in stable branch, though the sync script is not correctly set up yet. But I guess most of our users are actually nightly users?

j-baker commented 9 months ago

I was not aware that there's already a 'stable' branch and this definitely solves my use case. While my users do use nightly and stable (I have a 'useNightly' flag which lets them choose without having to really know Nix), they generally don't care about using latest nightly, more that they have some nightly features which haven't been stabilised in the last n years (like portable simd). So using nightly releases aligned with stable releases is fine in these cases.

Do you know if Nix lazily downloads flake inputs on normal usage (e.g not on nix flake update)? If so, I think this'd be satisfactory for me as well - I could depend on both the stable and master branches and choose the version to use based on the flag, avoiding the download for people who don't care about it.

holly-hacker commented 8 months ago

I'd like to chip in and say that 30MB can be quite significant on slower connections or connections where bandwidth is expensive, and it's a cost that needs to be paid every time a new flake is instantiated (eg. when starting a new project, even a small one).

But I guess most of our users are actually nightly users?

I personally rarely use the nightly branch when using rust-overlay, so using the stable branch is actually a great solution to me. It'd be great if this was advertised in the README somewhere, because I wouldn't have known if I hadn't checked this issue :)

If this stable branch works as I understand it, wouldn't it be possible to create a nightly branch for a specific time-frame (eg. one each year) to keep the download sizes down?

oxalica commented 6 months ago

I picked 1-2 years as the purging window for nightly and beta versions. I'm looking for feedback in #172