uber / neuropod

A uniform interface to run deep learning models from multiple frameworks
https://neuropod.ai
Apache License 2.0
936 stars 77 forks source link

Things to figure out before a major release #555

Open VivekPanyam opened 2 years ago

VivekPanyam commented 2 years ago

Neuropod's release cadence has been a bit odd in that we've released a bunch of RCs since the initial public release, but haven't put out a new "stable" release (although the RCs are generally pretty stable and are used in prod).

There are a few things blocking a stable release:

1 This depends on if we want to follow semver or not. For example, PyTorch does a minor release every quarter (~90 days), but these minor releases contain backwards incompatible changes (which goes against semver rules): https://github.com/pytorch/pytorch/releases

Also see https://github.com/uber/neuropod/issues/539#issuecomment-1082180595 and https://github.com/uber/neuropod/pull/552#issue-1265525875

I think we need to intentionally decide on what to commit to before actually making a major release so feel free to leave thoughts in the comments.

Proposal

One approach is to do major releases with breaking changes at most quarterly and minor releases as necessary (e.g. when a new backend version is released).

The benefits of backend ABI compatibility is that users can upgrade the core library (within a major version) without having to redownload all the backends they are using (this could be several GB). Currently (except for the last RC), every upgrade of the core Neuropod library required redownloading all the backends for the new version. This is especially impractical if backends upgrades are not done by the same team upgrading the core library.

The part I anticipate being difficult is ensuring we don't accidentally break backend ABI compatibility (forwards or backwards) within a major version. There are tools to help with this, but the situation within Neuropod is a little tricky. There are dependencies from (Neuropod core -> backend) and potential dependencies going the other way (backends -> Neuropod core).

One way to deal with this is to maintain ABI compatibility for both the core library and the backends, but that may be suboptimal. There are standard ABI compatibility checker tools, but we make extensive use of templates in the Neuropod public interface so I'm not sure how well those will work.

There are other solutions, but they require being careful about structuring includes, hiding symbols in libraries, maintaining field ordering in certain data structures, etc.

Most importantly, I think we need a robust way to check that we're not breaking anything in CI.

Two possible paths forward:

Option 1

Option 2

Feel free to leave any thoughts below. I think the major requirement for the solution we pick is that we need to be able to programmatically test and enforce it in CI.

If we can use an ABI compatibility checker that handles templates well, I'd prefer option 2 above as it seems more robust.