Closed ahendriksen closed 1 year ago
I quite like this proposal! I especially like that this encourages other good coding practices while solving the central problem. By encouraging that specific organization of header files, we should do a better job of ensuring that we have discrete, well-defined headers that we #include
for specifically what we need rather than falling into situations like we have with neighbors/specializations.cuh
.
+1 all the way around from me. The only thing that it would be interesting to see if we could improve on is the generality of the RAFT_EXPLICIT
macro. Being able to throw it on as needed with as little thought as possible will help ensure that folks use it consistently.
Having a useful error message is great! I do not mind paying an extra template declaration for it (and we only do this for expensive functions anyways).
Where to keep the docstring?
To me keeping them in '-inl' would feel more natural, that is the file I am looking at when I am interested in details. For a condensed view we have the rendered API doc.
This is a very well thought-out proposal! Some suggestions:
RAFT_EXPLICIT_INSTANTIATE_ONLY
so that it's very clear that we're prohibiting implicit instantiations.RAFT_COMPILED
and RAFT_EXPLICIT_INSTANTIATE
macros depends on the different compilation cases, specifically whether or not the RAFT_COMPILE_LIBRARY
CMake variable is set. It would be helpful to include a table in CMakeLists.txt or in docs describing the different cases are and what template definitions and instantiations will end up existing in each case.-ext
file as well as the corresponding block from expensive.cuh
, right? Is there more that needs to be done for those cases?Thank you all for your feedback.
The only thing that it would be interesting to see if we could improve on is the generality of the RAFT_EXPLICIT macro.
@wphicks I agree that would be nice, especially as structs are used in the most complex parts of RAFT. I already spent some hours in compiler explorer and I could not figure it out. One problem is that an extern template struct ...
statement instantiates the struct methods. If we label the struct methods with RAFT_EXPLICIT
, we immediately get a compilation error. I also couldn't achieve reliable raising of errors by putting a macro / static assert in the struct body. If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?
I would prefer to keep docstrings in the inl rather than the ext file.
@tfeher , @vyasr I think there is consensus that keeping the docstring in the -inl.cuh
header is better. I will address this in the PR. Another argument in favor of the -inl
header is that it keeps documentation and code closer together.
I prefer very explicit macro names, in this case
RAFT_EXPLICIT_INSTANTIATE_ONLY
@vyasr Good idea. I will adress this in the PR.
In the "Nuances" section you mention cases where you may want to allow implicit template instantiation while still providing some set of explicitly instantiated templates for common use cases. In those cases you would simply drop the -ext file as well as the corresponding block from expensive.cuh, right? Is there more that needs to be done for those cases?
I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext
file would remove a prominent signal which template have instantiations, and this single file would become unwieldy. A simple example still looks reasonable:
namespace raft{
template <typename T>
void expensive(T arg) { /* body */ }
} // namespace raft
#ifdef RAFT_COMPILED
extern template void raft::expensive<float>(float);
extern template void raft::expensive<double>(double);
// etc..
#endif
In practice, however, the extern template
instantiations are very visually noisy and would become unwieldy especially if wrapped in an ifdef RAFT_COMPILED
.
The exact interaction between the RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE macros [...] It would be helpful to include a table
I will try to find a place to these tables:
RAFT_COMPILED | RAFT_EXPLICIT_INSTANTIATE_ONLY | Effect |
---|---|---|
defined | defined | Templates are precompiled. Compiler error on accidental instantiation of expensive function template. |
defined | Templates are precompiled. Implicit instantiation allowed. | |
Nothing precompiled. Implicit instantiation allowed. | ||
defined | Avoid this: nothing precompiled. Compiler error on any instantiation of expensive function template. |
RAFT_COMPILED | RAFT_EXPLICIT_INSTANTIATE_ONLY | Which targets |
---|---|---|
defined | defined | raft::compiled , RAFT tests, RAFT benchmarks |
defined | Downstream libraries depending of libraft like cuML, cuGraph |
|
Downstream libraries depending on libraft-headers like cugraph-ops. |
I expect that it will also require some reworking of the conda recipes
I have not noticed that the conda recipes should change while testing the PR with cuML and cuGraph. Documenting my understanding of how downstream libraries consume RAFT below:
The RAFT CMake configuration always builds the header-only part, called raft
. When RAFT_COMPILE_LIBRARY
is set, then a precompiled library target is also created, going by a variety of aliases: raft_lib, raft::compiled, raft_compiled
. I don't know why we have so many aliases of the same library, but raft::compiled
seems to be the one that is most used.
We create three conda packages for the library: libraft-headers-only, libraft-headers, libraft
. The first two both package only the headers (and the libraft-headers-only
package has fewer runtime dependencies to make life easier for cugraph-ops). The libraft
package contains the precompiled libraft.so
and depends on libraft-headers
.
Downstream packages like cuml depend on the libraft
conda package. In their CMake config, an option like CUML_RAFT_COMPILED determines whether they "link" (in the CMake kind of way) to raft
or also raft::compiled
.
When cuML links to raft
, then RAFT_COMPILED
and RAFT_EXPLICIT_INSTANTIATE_ONLY
remain undefined. When cuML links to raft::compiled
then RAFT_COMPILED
is automatically defined and RAFT_EXPLICIT_INSTANTIATE_ONLY
is undefined.
If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?
If it's not too much trouble to do so, I'd love to take a look. I don't know if I'll have any luck, but it would be good for me to at least get a better understanding of the challenges.
@ahendriksen thank you very much for the write-up. I fully agree with your entire proposal, and it's a very clean way to break and sort headers, as well as disallow huge chains of rebuilds.
I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext file would remove a prominent signal which template have instantiations, and this single file would become unwieldy
I agree with this. We should always have -inl
and -ext
for any files that are providing pre-compiled functions. For certain functions, where implicit instantiations are acceptable so that the compiler doesn't have to wait for a definition and instantiation to be generated, I think the -ext
file should just not define the macro RAFT_EXPLICIT
. I imagine this happens for functions like raft::linalg::reduce
because they are heavily used internally. (Or perhaps, to keep a similar structure we define a macro called RAFT_IMPLICIT
)
I would lean towards having the docstrings in the
-ext
headers, as those headers are more dense and do not show the function bodies, which can give a nicer overview.
This is acceptable, as long as the HTML generated for the docs still show the include to be #include <expensive.cuh>
If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?
I would be interested in this as well!
In the "Nuances" section you mention cases where you may want to allow implicit template instantiation while still providing some set of explicitly instantiated templates for common use cases. In those cases you would simply drop the -ext file as well as the corresponding block from expensive.cuh, right? Is there more that needs to be done for those cases?
I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext file would remove a prominent signal which template have instantiations, and this single file would become unwieldy. A simple example still looks reasonable:
I think I worded my question poorly, and also perhaps made some invalid statements when writing this. What I was getting at was that, in order to special-case some templates to allow instantiation even when RAFT_EXPLICIT_INSTANTIATE_ONLY
is defined, those templates need the template definition available. I was trying to understand how those special cases should be handled. On a second look, I guess all that really needs to happen is that if I have some template expensive_but_always_allow_implicit
, then expensive_but_always_allow_implicit.cuh
would always include the inl
file even if RAFT_EXPLICIT_INSTANTIATE_ONLY
is defined. Is that correct?
Regarding the four cases, we can ensure that the bad case (RAFT_COMPILED
not defined but RAFT_EXPLICIT_INSTANTIATE_ONLY
defined) never happens as long as raft is being compiled via its CMake. If we want to be extra safe I suppose we could include a single small detail TU somewhere that contains a static_assert
if that particular combination is ever encountered.
The RAFT CMake configuration always builds the header-only part, called raft. When RAFT_COMPILE_LIBRARY is set, then a precompiled library target is also created, going by a variety of aliases: raft_lib, raft::compiled, raft_compiled. I don't know why we have so many aliases of the same library, but raft::compiled seems to be the one that is most used.
The short answer here is that raft_lib
is the actual compiled library, raft_compiled
is a target that is always made available so that it can be linked to in CMake even if using a header-only raft (in which case it will silently fall back to using raft in header-only mode), and raft::compiled
is an alias that is what should be used almost all the time by consumers.
The headers
and headers-only
conda package are equivalent from a build perspective and only different from a conda package perspective in terms of the recipe dependencies to allow consumers to not install all of the possible dependencies of raft if they are only using a subset of its functionality that doesn't use certain library dependencies.
Thinking about it further, I think you're right that we may not need to do additional work. The existing targets pretty much handle everything the same. The one possible difference is that this may uncover cases where a dependent library (cuml/cugraph) is instantiating a template that it shouldn't be when using a compiled libraft.so, but that would be exactly the kind of situation that the new layout helps to solve anyway so that's a good thing.
If it's not too much trouble to do so, I'd love to take a look.
@wphicks , @vyasr , Cool! I have prepared an example in compiler explorer. Here is also a non-working attempt for inspiration (link). The example consists of one piece of code and four compilers which have a different set of macros defined. Currently, not all compilers give the expected output.
I guess all that really needs to happen is that if I have some template expensive_but_always_allow_implicit, then expensive_but_always_allow_implicit.cuh would always include the inl file even if RAFT_EXPLICIT_INSTANTIATE_ONLY is defined. Is that correct?
This is indeed what I do. An example can be found in coalesced_reduction-ext.cuh, coalesced_reduction-inl.cuh, coalesced_reduction.cuh.
Regarding the four cases, we can ensure that the bad case (RAFT_COMPILED not defined but RAFT_EXPLICIT_INSTANTIATE_ONLY defined) never happens as long as raft is being compiled via its CMake. If we want to be extra safe I suppose we could include a single small detail TU somewhere that contains a static_assert if that particular combination is ever encountered.
I have considered this as well. I fear,however, that any static assert that checks for this situation will be drowned out by all the RAFT_EXPLICIT static_asserts and error messages. So I think the benefit is marginal.
The one possible difference is that this may uncover cases where a dependent library (cuml/cugraph) is instantiating a template that it shouldn't be when using a compiled libraft.so, but that would be exactly the kind of situation that the new layout helps to solve anyway so that's a good thing.
To clear up any possible confusion: this proposal does not advocate defining RAFT_EXPLICIT_INSTANTIATE_ONLY
for downstream libraries. At least, not initially. Therefore, these libraries should not have compilation failures due to instantiating the wrong templates. At some point in the future, some libraries may want to consider defining RAFT_EXPLICIT_INSTANTIATE_ONLY
, but I expect that it will require some work to align all the template types.
The error message we get during accidental instantiation is really useful! I could already make use of it while working on CAGRA instantiations. The message lists all the template types that was used during the instantiation, and I could cross check it with the explicit instantiations. This helped me to catch the missing const
from the default_accessor
of an mdspan that caused the mismatch.
Here is how the actual error looked in my case:
error: static assertion failed with "ACCIDENTAL_IMPLICIT_INSTANTIATION
detected during:
instantiation of "raft::neighbors::experimental::cagra::index<DataT, IdxT> raft::neighbors::experimental::cagra::build(
const raft::device_resources &, const raft::neighbors::experimental::cagra::index_params &,
raft::mdspan<const DataT, raft::matrix_extent<IdxT>, raft::row_major, Accessor>)
[with DataT=float, IdxT=uint32_t, Accessor=raft::host_device_accessor<std::experimental::default_accessor<const float>, raft::memory_type::device>]"
Sorry I think I'm a bit confused about the godbolt example now. You're showing the four possible combinations of values of the two macros and what should happen in each case. In this particular example I think you could trigger the desired error in the one problematic case by inserting a static_assert(false, "This should fail");
as the contents of the expensive::run
definition that is in the ifdef EXPLICIT_ONLY
block, but I get the feeling that's not the question that you're asking. Could you clarify what we're trying to make happen here?
you could trigger the desired error in the one problematic case by inserting a static_assert(false, "This should fail");
Could you clarify what we're trying to make happen here?
It is indeed a bit tricky. As you indicate, you can trigger an error using the static_assert. This will cause errors in too many situations though. The static assert fires whenever EXPLICIT_ONLY
is defined, which causes problems when you use an instance which has been explicitly extern template
instantiated (second row int table below).
Macros defined | We want | static_assert solution |
---|---|---|
USE_EXPLICIT_EXTERN | Linker error (use external symbol) | ✔ Linker error |
USE_EXPLICIT_EXTERN + EXPLICIT_ONLY | Linker error (use external symbol) | ❌ Compiler error |
EXPLICIT_ONLY | Compiler error | ✔ Compiler error |
None | Compiles fine | ✔ Compiles fine |
note: of course we don't want a linker error in RAFT, but for the purpose of the compiler explorer example we want a linker error (otherwise we would have to create multiple translation units and link them which would become too complicated).
Yup the linker error makes sense in the context of godbolt. I don't think I was clear enough, doesn't inserting:
#ifndef USE_EXPLICIT_EXTERN
static_assert(false, "Can't compile");
#endif
solve the problem?
That would solve the problem by "cheating". I think I see where the confusion is coming from and I realize I should have been clearer. I am using the USE_EXPLICIT_EXTERN
in the context of the godbolt example to drive the example code. There is no analogous RAFT macro. The EXPLICIT_ONLY
macro is an analogue to the RAFT_INSTANTIATE_EXPLICIT_ONLY
macro. So RAFT can't make use of the USE_EXPLICIT_EXTERN
macro to drive behavior.
The table below summarizes the situation without putting both macros on the same pile.
Example uses | Macro defined | We want | static_assert solution |
---|---|---|---|
expensive<int> |
Linker error (use external symbol) | ✔ Linker error | |
expensive<int> |
EXPLICIT_ONLY | Linker error (use external symbol) | ❌ Compiler error |
expensive<float> |
EXPLICIT_ONLY | Compiler error | ✔ Compiler error |
expensive<float> |
Compiles fine | ✔ Compiles Example |
Here, expensive<int>
has the extern template
instantiation.
This issue describes problems we currently have with using specialization headers to reduce compile times and proposes a fix.
This issue accompanies https://github.com/rapidsai/raft/pull/1415
I have added this issue to motivate the current design of the PR and solicit feedback on certain points.
Problem
RAFT is a heavily templated library. Several core functions are very expensive to compile and we want to prevent duplicate compilation of this functionality. To limit build time, RAFT provides a precompiled library (
libraft.so
) where expensive function templates are instantiated for the most commonly used template parameters. This is not a complete solution, however, due to two problems (1) accidental reinstantiations and (2) unnecessary dependencies:In practice, it often happens that a function template is accidentally reinstantiated, even when it is already provided by the library. This can have a big negative impact on compile times. It happens in consuming libraries, like cuML, but is also prevalent within
libraft.so
itself [#1358, #1360].A minor change in the core implementation of a function often causes the recompilation of dependent translation units. A particularly bad example is
neighbors/specializations.cuh
, which provides the extern template instantiations for many expensive kernels and is also included over 100 times in the RAFT project. Any change to any of these kernels triggers a rebuild of all 100 files that include the specializations header.Requirements
A solution to these two problems should satisfy the following requirements:
An accidental reinstantiation must be impossible when compiling
libraft.so
, i.e., it must result in an error.In case of an error, the solution must result in useful and actionable error messages. Specifically, it should not introduce linker errors during development (they only fire at the very end of compilation and do not point to the source location that introduced the error).
It must be possible for downstream libraries to continue to use RAFT as a header-only library, as well as a precompiled library.
In addition, there are some nice-to-haves. That is, we obviously want to write as little boilerplate code as possible. Also, it would be nice if downstream code would not have to bother with the following if else preprocessor logic when including a header:
Background: C++ templates
I include some background on C++ function templates here, feel free to skip and refer back in case of questions.
A function template in C++ can be declared, defined, and instantiated. In addition, there is something called template specialization but this is not used in RAFT (the “specializations” directory currently contains template instantiations; a case of unfortunate naming).
A template instantiation can be explicit, extern explicit, or implicit. An explicit instantiation looks very similar to a function template declaration. Instead of having the angle brackets in front of the function name, however, the angle brackets are placed behind the function name. When an explicit template instantiation is encountered, the compiler will generate code for the template instance. To prevent generating code, an extern template instantiation can be used, which tells the compiler that some other translation unit already contains the code. An implicit instantiation is an instantiation of a function by use, e.g., a function call, taking a reference, etc.
Examples of declaration, definition, and both types of instantiation are shown below:
An important difference between implicit and explicit instantiation is that explicit instantiation requires that the template has been defined (the function body is needed to generate code). An implicit instance only requires that the template has been declared: instead of generating code the compiler will insert a linker directive. If the template is not instantiated anywhere else, this will result in linker errors.
If the compiler encounters an extern template instantiation and an explicit instantiation, the explicit instantiation “wins”: the code for the template is generated and ends up in the translation unit.
The key to the proposed solution consists of (1) listing allowed instantiations using explicit template instantiation and (2) controlling exactly when and where implicit template instantiation is allowed (i.e., a non-listed instantiation). Part (1) of this solution is already implemented in the specializations headers in RAFT, part (2) would allow implicit instantiations in the following situations:
When compiling
libraft.so
: expensive function templates are explicitly instantiated and their implicit instantiation results in a compiler error.When compiling an external package that depends on
libraft.so
: implicit instantiation is allowed and when an extern template instantiation is available it is used (code generation is skipped).When compiling with header-only RAFT: implicit instantiation is allowed and extern template instantiations are unavailable.
Solution
What
This section describes (1) macros that are used to control template instantiation, (2) a structure for organizing header files, and (3) a mechanism for meaningful error messages in case of accidental template instantiation.
Macros. We define the macros
RAFT_COMPILED
andRAFT_EXPLICIT_INSTANTIATE
during compilation oflibraft.so
and the tests and benchmarks. TheRAFT_COMPILED
is already used in RAFT and indicates if a translation unit will be linked withlibraft.so
. It is currently marked inCMakeLists.txt
asINTERFACE
, meaning it is defined only for downstream libraries using RAFT. In this proposal, it is markedPUBLIC
meaning it will be defined both for compiling RAFT as well as downstream libraries.The
RAFT_EXPLICIT_INSTANTIATE
macro is new and is only defined during compilation oflibraft.so
itself (i.e.PRIVATE
). When defined, it indicates that implicit instantiations of expensive function templates are forbidden (they should result in a compiler error).Header organization. Any header file that defines an expensive function template (say
expensive.cuh
) should be split in three parts:expensive.cuh
,expensive-inl.cuh
, andexpensive-ext.cuh
. The fileexpensive-inl.cuh
(“inl” for “inline”) contains the original template definitions and remains largely unchanged. The fileexpensive.cuh
inludes one or both of the other two files, depending on the values of theRAFT_COMPILED
andRAFT_EXPLICIT_INSTANTIATE
macros. The fileexpensive-ext.cuh
contains the external template instantiations. In addition, ifRAFT_EXPLICIT_INSTANTIATE
is set, it contains template defitions that ensure that a compiler error is raised when a function template is implicitly instantiated.The dispatching by
expensive.cuh
is performed as follows:The file
expensive-in.cuh
is unchanged:The file
expensive-ext.cuh
contains the following:First, if
RAFT_EXPLICIT_INSTANTIATE
is set,expensive
is defined. This is necessary, because the definition inexpensive-inl.cuh
was skipped. The macroRAFT_EXPLICIT
defines the function body: this macro ensures that an informative error message is generated when an implicit instantiation erroneously occurs and is described below. Finally, the extern template instantiations are listed.To actually generate the code for the template instances, the file
expensive.cu
contains the following. Note that the only difference between the extern template instantiations inexpensive-ext.cuh
and these lines are the removal of the wordextern
:Error messages. Function templates that should be explicitly instantiated are tagged with the
RAFT_EXPLICIT
macro(link). This macro defines a function body that triggers an error when the template is accidentally instantiated. The error message looks like this:This should help a user who just wants the error to go away while prototyping some new functionality (they will have more than enough time to circle back to this issue while the code is compiling). In addition, it tells the user how to fix the issue the “right way” by adding explicit instantiations to the correct files.
Why/how does this solve the problem
I address why this solution prevents accidental reinstantiation and unnecessary dependencies in three situations: incremental compilation with tests and/or benchmarks, compilation in RAFT CI, compilation of downstream libraries.
Incremental compilation. In this situation, libraft.so, the tests, and benchmarks are compiled with
RAFT_COMPILED
andRAFT_EXPLICIT_INSTANTIATE
defined. Therefore, when a template is instantiated that is not in the list of explicit extern instantiations, it will raise a compiler error.When the internals of a kernel in
expensive-inl.cuh
are changed, the test file will not have to be recompiled since it only depends onexpensive.cuh
andexpensive-ext.cuh
. Only the fileexpensive.cu
will be recompiled. This prevents chains of recompilations from forming as a result of a simple change.Compilation in CI. Here, we can also catch accidental implicit instantiations. In addition, as a result of reducing unnecessary dependencies, the probability that
sccache
has a cache hit is drastically increased, since changes to the internals of expensive function templates do not lead to a cascade of recompilation anymore.Compilation of downstream libraries. Downstream libraries can either use header-only RAFT, in which case nothing changes (both macros are undefined and the
*-inl.cuh
headers are included), or they can uselibraft.so
in which caseRAFT_COMPILED
is defined. This way, when the library uses a template instance that has already been compiled inlibraft.so
, code generation is skipped and the code inlibraft.so
is used. When the library instantiates a function template that has not been instantiated inlibraft.so
, the code is generated like it was before without any errors. The library might want to opt-in to implicit instantiation prevention by definingRAFT_EXPLICIT_INSTANTIATE
, but I expect this will be rare.Advantages / disadvantages
First of all, this proposal solves the problems of accidental reinstantiation and unnecessary dependencies. It therefore substantially cuts down on compilation times and we can count on this being the case in the future.
In terms of code organization, an additional advantage is that things are easier to find. The current directory structure of
specializations
does not follow the directory structure ofinclude/
and it can be a bit difficult to find an extern template instantiation. The proposed organization always has extern template instantiations in the*-ext.cuh
header and the actual instantiations in thesrc/
directory in the same relative directory.A disadvantage is that the proposed structure requires more boilerplate, as we have four repetitions of the function template declaration instead of three. This can be bit annoying to set up, but it only has to be done for new functionality or when the interface of a function changes. This will happen more rarely than changing the internals of a function, which should now be considerably faster.
Nuances
Some kernels are instantiated many times, but it may not make sense to actually prevent implicit instantiation. One case is
raft::linalg::reduce
that is used in many places. To cover all instances with an extern template instantation would be prohibitive, but covering none would result in some instances being defined in over 80 translation units. In this case, it would make sense to allow implicit instantiations and also have extern template instantiations for commonly used types.The
RAFT_EXPLICIT
macro only works reliably for functions. In thedetail
headers, there were some cases where a struct was used to dispatch to different kernels. In these cases, I had to replace a member function by a free function.Alternatives considered
One alternative that would require less boilerplate is to declare instead of define function templates. A downside is that it would result in linker errors but not raise any compiler errors on implicit instantiation (and thus would not tell you where the instantiation occurred).
Another alternative is to continue with the current strategy. A downside is that it will require constant firefighting to ensure that (1) extern template instantiations are used when they are available, (2) index types and data types at usage site do not start to diverge from the extern template instantiations. All of this would have to be done based on the build time reports (an increase in build time being an indication that something is wrong) and copious use of
cuobjdump
to hunt for duplicate template instantiations. So far, experience has shown that this is a brittle strategy.Open questions
Would it be worth developing a strategy for forcing explicit instantiation of structs as well? They are sometimes used internally and if we can it to work it would save quite some typing, but I fear the solution would look complicated (even compared to the current proposal).
Where to keep the docstrings? In https://github.com/rapidsai/raft/pull/1415, I have mostly opted to keep docstrings in both `-ext` and `-inl` headers. I would lean towards having the docstrings in the `-ext` headers, as those headers are more dense and do not show the function bodies, which can give a nicer overview.