Introduction

Some time ago we sat down and discussed ABI stability issues with a group of Facebook engineers who care about the topic. As a result, we assembled a document that discusses the problem area, trade-offs, and possible approaches.

We plan to use it as a basic guideline that C++ library (primarily client-side and mobile) developers might use to make a thoughtful decision to support ABI stability (or not). We also want to share the document with our partners and community to be open and transparent about our reasoning and values in this regard.

This is the first version of a quite technical document, I would like to get any comments or criticisms.

The document

C++ ABI stability for library developers

What is ABI

An ABI is similar to an Application Programming Interface but for machine code.

A useful analogy for this is comparing ABI with a kind of imaginary network protocol that defines how a binary structure of a function caller and callee communicates (just as a server and a client).

Technically speaking, an ABI is a low-level, hardware-dependent format that defines how data structures or computational routines are accessed in machine code. So, an ABI defines how the high-level constructs like function arguments or data structures are represented (e.g. via CPU registers or stack-allocated memory) in machine code to perform a function call.

Concretely, an ABI defines how to exactly put parameters of a function in memory or in registers and how to reinterpret the memory composed of those logical parameters (e.g. how the order of the fields in the struct is related to the order of bytes in memory, how they are aligned, padded, etc).

When a C++ library gets compiled from sources together with an application that relies on that, an ABI is naturally stable because all the functions inside the library and the application share the same compilation environment. Things start to get challenging when the library and the application are compiled separately and then linked together.

Here is a non-exhaustive list of things that might affect ABI (from the C++ perspective):

Interface of the library;
CPU architecture. Parts of an ABI are intrinsically architecture-specific.
Standard library version. The standard library does not guarantee ABI stability therefore having public interfaces relying on it makes them ABI-unstable. Also, types from Standard Library are not compatible among different vendors;
Compilation flags. Many compilation flags change ABI (e.g. -DNDEBUG, exception support, optimizations, etc);
Particular compiler and platform-specific choices. vtable layout, exception tables, RTTI, and calling conventions are parts of an ABI.

When some of those limitations are unsolvable (like CPU arch), some libraries use special techniques to overcome the rest so that they are linkable with anything else in most scenarios; this feature is called "ABI stability".

It's not always easy to automatically detect (e.g. via compilation error) or even observe an ABI break. ABI issues might trigger linking problems, instant crashes, crashes that happen only in prod (at scale), or cause the library to behave differently making the application produce incorrect results.

When ABI stability matters for libraries

If one of those scenarios looks plausible for your library, you should consider investing in providing ABI stability.

New versions of your library have to be linkable with other apps originally compiled with older versions of your library. In some cases, it's inconvenient to rebuild the whole app just to test it with a new version of a library. It can take several hours to recompile everything instead of less than a minute for compiling the only changed parts. Slowing down the developer iteration cycle can be disruptive to some projects especially in cases where the app is needed to be recompiled with many versions of the library regularly (e.g. for different platforms).
Your library needs to be distributed (and updated) independently from apps based on it. In some cases, it's just impossible or too dangerous to distribute a library with all apps that use it on a particular machine. That mostly happens because of two reasons:
1. It would be too wasteful to deliver, store, load and initialize almost the same low-level library (e.g. networking, SSL, image decoding) for all apps that use it on a machine. (Imagine if all iOS apps would have a built-in on-screen keyboard inside their binaries.)
2. It would make it impossible to apply security patches to such libraries. The situation when fixing something like Heartbleed would require updating all apps on the platform. It would be a nightmare.
Your library must not impose language-specific limitations on other apps using your library. A library might introduce some not so obvious limitations to the code of the apps. For example, if your library exposes interfaces that rely on a more modern version of C++ than the application uses, it might not even compile. In the opposite case, if an app uses a more modern version of the language than the library, it also might not compile (but it’s rare).

Approaches

If your library will benefit from ABI stability, then we should talk about concrete approaches to get the stability.

Accidental ABI stability (or maintaining status-quo)

Most libraries were not designed to be ABI-stable from day one because initially there was no need for that. Over time some library maintainers are finding themselves in a situation when they suddenly have to provide ABI stability. This might happen because of new use-cases (and constraints) or just because consumers are already relying on the stability mistakenly assuming that those guarantees always existed. It's a tough situation to be in. From that point, it practically means that only very small extensions to existing API are allowed. Pretty much everything besides adding additional non-virtual methods will break an ABI to some extent. Sometimes even changing implementation of some methods is not safe. Here are just some classes of changes which might break ABI:

Quite obviously, removing anything will break an ABI because the application might use that.
Adding a new data member to class or struct will change the compound size of the struct (and all other structs that have it inside). This will break the ABI because the application relies on the size.
Adding a virtual function breaks ABI because it changes the layout (and possibly the order) of entries in the vtable. If the 3-rd entry in the vtable corresponded to foo, and now it corresponds to bar, that is a breaking change.
Adding a new defaulted argument to existing methods will break an ABI because the method will get a new mangled name (which means removing the old one).
Changing the implementation of some methods in a somewhat incompatible manner with the previous implementation might cause logical errors in the code. This can happen because some of the previous implementations (possibly conflicting with the new one) of those functions can be inlined into the application. To make it safe, the old and new implementations of the same function must be able to work together. Besides that, even if it usually works fine, theoretically, having two different implementations of the same function inside one executable is a violation of the one definition rule.
The actual shape of a public API of your library might be drastically different from what's documented (and what you expect). By default, all symbols are public, so all symbols might be used by other libraries and rely on. Therefore, the shape of the actual public API of the library can be drastically bigger than "officially promised".

All that boils down to the fact that supporting accidental ABI-stability for a library is possible but only with pretty much no changes in it. Eventually, with a big enough consumer base, every single internal implementation detail will be relied on by some external code, which means it could not be changed anymore. This observation is known as informal Hyrum's Law.

Planned ABI stability

The most reliable and flexible way to provide ABI stability is to deliberately design the public API of a library to avoid all previously discussed pitfalls. All this condenses to a few main principles:

Use only plain C on the actual ABI stable boundaries of the library; do not use C++ or STL containers (or carefully choose a stable enough subset of it);
Hide all symbols by default, expose (export) only what's documented explicitly;
Allocate and deallocate all data structures on the heap and manage them from inside the library; do not pass the stack-allocated structures on ABI boundaries;
Use fully dynamic approaches (like message passing) to call methods on classes/instances (e.g. IUnknown);
Disallow inlining on ABI boundaries making all the implementation details unobservable (and distributed only within the library);
Enforce inlining for idiomatic C++ wrappers (if used) to decouple it from distributed parts of a library.

Concrete approaches for Planned ABI stability

Using plain C exclusively for public API

The first and most simple approach to achieve ABI stability is to just formalize all public APIs as a set of plain C functions. Plain C ABI practically never changes language-wise (within the same platform) and it's a foundation of all other ABIs for other languages and libraries. So it works.

To make the library ABI-stable these things need to be ensured:

No removal of methods.
No reordering or removing struct fields. The offsets to fields are compiled into an application, therefore reordering will break it.
No stack-allocated structs. Sizes of stack-allocated structs are compiled into an application, therefore all allocation/deallocation operations must be performed by the library itself; only trivial types (like int or double) and pointers are allowed on API boundaries. The library will probably need to use some naming conventions to make it explicit (e.g. (Get/Create/Release convention from CoreFoundation) (https://fburl.com/uis74v5b)). This measure allows for adding additional fields to structs in a backward-compatible manner. Another approach to this problem would be avoiding structs entirely.
No accidental exports. To avoid accidental leaks of private symbols, default value for visibility must be hidden (via -fvisibility=hidden) and only documented symbols should be marked as visible explicitly.

ABI-stable API with idiomatic C++ wrappers

An obvious downside of the previously described approach is that it does not use idiomatic C++. This leads to poor ergonomics and a lack of safety which modern C++ provides. In many cases, this problem can be mitigated by building some header-only, compiled-away, C++ abstractions on top of the plain C APIs (that wrap it back to C++). This way, the ABI-safety is achieved because the ABI-unsafe code does not change with the library upgrade (because it's not being distributed in a compiled form).

This model has a few caveats though:

The wrapper might impose some C++ constraints on the application code. E.g. If the wrapper uses C++14, it cannot be compiled with application code written in C++11. Moreover, theoretically, the C++11-based library might not compile with an application using C++20 (but those cases of backward-incompatible C++ code are rare).
All wrapper code must be inlined to the caller. This can be achieved by using vendor-specific attributes that enforce inlining (e.g. (always_inline)[https://gcc.gnu.org/onlinedocs/gcc/Inline.html]).

Using a subset of C++ to build a dynamic invocation and reference counting interfaces

In some cases, it's reasonable to build very dynamic interfaces that naturally support backward-compatible changes. In those cases, it's safe to use some basic C++ features that never change ABI-wise. In this model, every new version of the interface is a completely new interface with a unique id (e.g. GUID) which has to be queried from the basic interface (e.g. IUnknown) before it can be used aka dynamic conformance checking. The most popular examples of this approach are Microsoft's IUnknown/COM.

This model is also ideologically similar to Objective-C which also heavily relies on message passing and dynamic interface querying. Objective-C does not provide ABI-safety out of the box, but ABI issues in Objective-C world are rare and easy to workaround.

Trade-offs

Building and maintaining an ABI-stable interface for a library is a challenging task requiring specific expertise and additional time. It's an extremely expensive effort. Therefore any team considering that has to weigh all trade-offs before making the decision to invest into it.

Considering all of the benefits that ABI-stability gives (which are different for different projects), a team will need to balance it against some downsides.

Runtime performance issues. ABI-stable abstractions often use all kinds of marshaling, heap allocations, dynamic interface checking, and/or virtual dispatch. All that comes at a cost. In cases where cross-ABI traffic is quite low, the performance aspect is insignificant. In other cases, where we have hundreds or more calls per second, this can be a deal-breaker. Even if the performance problems can be resolved, it usually comes with quite high engineering costs and losses in library ergonomics (because it is usually solved by using some sort of batching).
Investment in tooling. It's very easy to break ABI accidentally. One of the first tools that a team should build and set up as part of CI is a public symbols checking. If there are no changes in public symbols, the change is probably safe. If there are changes, someone with deep ABI expertise should review it.
Developer velocity and careful planning. It is a huge commitment to introduce a new change to the public interface that cannot be undone. This brings a whole new mindset which requires extra time for thinking, testing, experimenting, getting alignment with other teams, and, in general, a lot of additional time. This way of working is already a reality for some teams dictated by specifics unrelated to ABI, for others, such radical slow down in developer velocity might be unacceptable.
Additional area of expertise and responsibility. Even if it's not usually required for everyone at the team to be an expert in ABI-stability, it certainly requires a decent understanding of the problem area for everyone who touches the code. Besides that, the team should have a dedicated person with a deep understanding of all specifics needed to review all the changes for maintaining ABI-stability.

It may be a good idea for a team to work for some time on a small scoped part of the interface trying to maintain that ABI-stable and see how feasible and expensive it is.

Conclusion

Supporting ABI-stability gives a lot of flexibility for customers but comes at a huge cost for the library developers. In some cases, however, libraries simply must provide that. If your library needs to be ABI-stable, embrace the importance of it, accept the price and time commitment, and go for it! If you are not sure, you probably don't need it.

react-native-community / discussions-and-proposals

C++ ABI stability Guidelines #257