More efficient incremental updates

rbtcollins commented 5 years ago

Describe the problem you are trying to solve

Toolchain updates are currently very inefficient, particularly with documentation, most of the new content is identical to the previous content; this wastes internet bandwidth for low bandwidth regions, and we currently churn on disk - removing the unaltered content and then replacing it, which has knock on effects on search indices, for a net waste of CPU and battery

Describe the solution you'd like Installing an update that is 99% the same as the previous version would only take 1% of the resources - a small download and a small number of writes to disk.

Notes This was discussed in the rustup-wg meeting 23/4/19; it isn't currently resourced but it was considered an interesting idea.

I think we have broadly two basic performance sensitive use cases: 1) First install: whether interactive or automated for a CI job. This needs to be decently fast, so we cannot ignore its performance. No atomicity is needed: rust isn't working before the install begins, and until it completes thats understood. 2) Daily/adhoc maintenance: almost always human initiated I suspect, but perhaps not? This needs to also be decently fast, but ideally it would be faster than the initial install (currently it is slower because we have to remove the old content) As rust was working before it was run, minimising the time period during which a given toolchain doesn't work is desirable

I'm not sure if the rollback/transactional aspect really is important to preserve as-is; I rather suspect that a recoverable model would be better - more flexible for this sort of optimisation, and a better fit for reality (because reality doesn't guarantee processes can complete).

A rustup modified to work like this would look something like the following:

an additional server side step taking the various components and putting them into some distribution framework (e.g. rsync or bittorrent or <...>). This may require a server side component to run, or we may be able to use dumb servers with precalculated information - this is a framework specific consequence
rustup initial installation of a toolchain can proceed as it does today unaltered
rustup updates to a toolchain would use the distribution framework in situ, rather than staging a new entire component version, deleting the old one, and moving the staged one into position.
if an in-situ update fails or is interrupted, running it again would converge it on the desired content, though this might be less efficient than the clean case from known-state a to known-state a'.

There are some possible complications - e.g. do we have components nested within others, which many distribution systems are unlikely to like? - but these should be able to be worked through with some care.

If someone wants to test to see what the potential benefits might be, just measure updating .rustup/toolchains from another machine using some delta system (rsync/dropbox/bittorrent/etc) ; key examples would be nightly and stable.

nrc commented 5 years ago

This does sound great to have. I believe our servers at the moment are very dumb, and I think we probably need to keep them that way, but if we can pre-calculate the information we need, that would be fine.

rbtcollins commented 5 years ago

I don't have the time to take on building this at this point, but I do have the time to help someone interested work through technology choices, interactions and implications on different platforms, and likely constraints/desired capabilities for the components we install to make this really fly.

pickfire commented 5 years ago

I would like to take this. @rbtcollins What do I need to do?

rust-lang / rustup

More efficient incremental updates #1798