Handling of breaking changes

Gaelan commented 3 years ago

Recently, we've had two commits to vhost (https://github.com/rust-vmm/vhost/commit/a8ff939161d41fc2f449b80e461d013c1e19f666 and https://github.com/rust-vmm/vhost/commit/9982541776a603d30556a06df55e8c0491072763) that caused vhost-user-backend to stop compiling. I don't think we should be discouraging breaking changes—we're still in early stages of the project here, and I think it's more important to avoid technical debt than insist on backwards compatibility—but it'd be nice if we could do so in a way that was less likely to break peoples' workflows. Some thoughts on ways we might do this:

Publish to crates.io earlier

Currently, we have a policy of not publishing crates to crates.io until they are production-ready. This means that, in practice, we use a lot of git dependencies, which means we lose out on any sort of version numbering, making it hard to avoid breaking updates to dependencies. We could consider publishing to crates.io earlier—say, as soon another crate depends on the crate in question—and using something else, such as a 1.0 version number or a note in the readme, as an indicator of production-readiness.

In this world, a breaking update to a crate would be released as, say, version 0.2; any dependent crates would stay on 0.1 until they explicitly changed the version in their Cargo.toml. This means we may stay out of date for longer, but allows us to handle breakage on our own time.

Make more extensive usage of workspaces/monorepos

In a few cases, such as vhost-device and vm-virtio, we've started to put multiple crates in a single repository. Aside from the convenience in adding new crates (no need to create a new GitHub repo), workspaces also have another benefit: unified pull requests and CI. In a world where the dependency and dependent are in the same workspace, we would simply fix breaking changes in the same PR we introduced them, so there is never a window where code doesn't compile (as it does now) or uses an out-of-date dependency (as it would with the "crates.io" solution). If a PR author forgot to do so, CI would fail.

Both of these are somewhat large departures from the current workflow, but they would go a long way towards being able to clean up technical debt without causing breakage for others. Thoughts?

andreeaflorescu commented 3 years ago

This is a really good point!

I find the option to of publishing non production crates on crates.io dangerous because only we know what non production means (assuming we would come up with some sort of convention to mark it as non production ready). For other consumers being published to crates.io means that it can be used without (known & major) risks. It can also be that other people build on top of the published crates, and then publish their crate. This would make a transient user of rust-vmm completely unaware of using what we call not production ready crates.

I definitely in favor of using more workspaces where it makes sense. It also makes it easy to maintain the crates (only updating rust-vmm-ci can take a lot of time when it's split across multiple repositories).

In the meantime, I think there is a third choice. Even when using dependencies from git, we can lock them at a specific revision. This means that the build does not break from unrelated changes, and you need to explicitly upgrade a dependency. While this does not give you the benefit of having a single PR with all relevant changes, it does help with making sure that the projects always build. The inconvenience with this approach would be that you need to manually update the dependency (by design). This can be addressed also by enabling Dependabot, and I think that's also how Cloud Hypervisor is consuming crates from git without breaking the build.

Gaelan commented 3 years ago

For other consumers being published to crates.io means that it can be used without (known & major) risks.

I think if we published crates with a pre-1.0 version number and a prominent note in the README, it would be pretty hard to depend on it without knowing what you're getting into. It might also help to use "prerelease" semver versions such as 0.1.0-alpha.1, which I understand are even harder to install accidentally.

This would make a transient user of rust-vmm completely unaware of using what we call not production ready crates.

I don't think this really creates much of a new risk. There are plenty of bad things a dependency can do (be poorly written, use too much unsafe, be outright malicious). Adding a new category of bad thing (depending on an alpha-quality rust-vmm crate) would only impact people who weren't doing enough due diligence anyway, and would already be likely to be affected by one of the above.

All that being said, this does mean a non-zero increase in the risk of someone depending on a crate that isn't production-ready, and maybe we don't think it's worth it.

In the meantime, I think there is a third choice. Even when using dependencies from git, we can lock them at a specific revision. This means that the build does not break from unrelated changes, and you need to explicitly upgrade a dependency. While this does not give you the benefit of having a single PR with all relevant changes, it does help with making sure that the projects always build. The inconvenience with this approach would be that you need to manually update the dependency (by design). This can be addressed also by enabling Dependabot, and I think that's also how Cloud Hypervisor is consuming crates from git without breaking the build.

Hmm, that could work. My one concern would be managing upgrades across the ecosystem. It's easy to imagine five different crates depending on different commits of vm-virtio. In the best case, it would cause code bloat; in the worst case, it would cause compiler errors because types defined in different versions of the same crate are considered different types. Imagine this dependency graph:

A <- B <- C

To avoid this, after any update to A, we'd need to update B to depend on the new version, then, once that was merged, update C to depend on the new versions of A and B. This could end up being a lot of unnecessary updates, especially as dependency graphs get larger.

Of course, this issue also occurs to some extent with the crates.io solution, but because we'd be able to specify semver ranges, it would only happen on breaking changes instead of all changes.

We could also consider setting up a custom crate registry, but I imagine that's more trouble than it's worth.

I think the workspace solution makes the most sense here, from what's been discussed so far.

rust-vmm / community

Handling of breaking changes #111

Publish to crates.io earlier

Make more extensive usage of workspaces/monorepos