Open jrhemstad opened 6 years ago
Do the clang tools work with nvcc? Much of our template time is in nvcc. You can use nvcc --time foo.csv
to dump timing for different nvcc phases.
I'm not sure. You can technically get Clang to compile device code, so that may be a path worth exploring using Clang + these tools.
Any updates on this? I'd love to use precompiled headers with CUDA projects.
Compile time continues to grow, but that is largely because our supported feature set and supported types continue to grow. In 0.15 we are aiming to add at least 10 new types (4 unsigned int types, 4 timestamp types, list column type, decimal fixed-point type). Naturally this will increase compile time and binary size.
Meanwhile, in 0.14 we dropped all of the legacy APIs that were previously deprecated, which reduced compile time a bit, and significantly reduced binary size. There have been and will continue to be various efforts to reduce compile time of certain components. We are investigating possibly splitting libcudf into multiple libraries.
We have not discussed precompiled headers.
@jrhemstad @harrism , is this still a relevant issue?
Our compile time is worse than ever, so I guess its still relevant. We could benefit from someone putting in a concerted effort to eliminate unnecessary includes across the library.
I'm not sure. You can technically get Clang to compile device code, so that may be a path worth exploring using Clang + these tools.
Out of curiosity I gave this a quick shot. (Unsurprisingly) clang does not currently support the experimental CUDA features that we have enabled (--expt-extended-lambda --expt-relaxed-constexpr
) so the compilation terminates pretty quickly. Not sure if there are other downstream issues that we would face if we attempted this after stripping those out (not suggesting that we should, although #7795 remains open so maybe it is worth pursuing).
clang does not currently support the experimental CUDA features that we have enabled (--expt-extended-lambda --expt-relaxed-constexpr)
Pretty sure clang supports those features natively without the need for any extra compile flags. I'm guessing the error was caused by clang not recognizing those flags.
You're right, it does. I removed those and made some progress, but not nearly enough for a working build with clang yet. Here's a list of necessary changes so far:
CUDF_CUDA_FLAGS
set in ConfigureCUDA.cmake. Most are either unsupported or ignored (including some of the warnings flags), so the blanket removal is easiest.[[nodiscard]]
attributes, which don't appear to be supported in clang device code yet.-D__STRICT_ANSI__
as a CUDA compiler flag. Otherwise it tries to compile float128 code, which is not yet supported in clang.-fcuda-allow-variadic-functions
to the clang compiler (or via the CMake configure CLI -DCMAKE_CUDA_FLAGS="-Xclang -fcuda-allow-variadic-functions"
At this point I start seeing failures like this:
...type_traits:2777:5: error: no type named 'type' in 'std::invoke_result<cudf::detail::indexalator_factory::nullable_index_accessor, int>'
and
error: type 'thrust::transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator>::super_t' (aka 'iterator_adaptor<transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>, cudf::detail::input_indexalator, int, thrust::use_default, std::random_access_iterator_tag, int>') is not a direct or virtual base of 'thrust::transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator>'
I need to track this down a bit further, but it looks like some aspect of how thrust SFINAEs different code paths isn't supported in clang device code yet either.
I tried to build cuco with clang about a year ago and was blocked by its dependencies like thrust or libcudacxx that cannot be built with clang. To find how much effort is required to build device code with clang, I would suggest starting with a smaller library like cuco and see how it goes from there.
Related issues:
Well then... looks like we've got to work our way all the way up the stack for this. For the purpose of something like clang-tidy we might be able to get some partial results based on the discussion in https://github.com/rapidsai/raft/pull/424, but that's probably only partial support at best and I don't know if that will work with the other tools of interest like IWYU.
Compile times are an ever-present problem for us. This issue as currently framed isn't clearly actionable, so let's lay out some concrete points.
We should make sure that best practices are being followed in template instantiation
type_dispatcher
, which now controls all our template instantiations and ensures that we have a minimal set.Compiling with -ftime-report may be a good place to start to see where all the compilation time is being spent.
Since #9631 we have been tracking build times in CI. We monitor this and keep an eye on TUs that are particularly slow to compile. Where necessary, we have reacted to slow compilation by reorganizing the code and explicitly instantiating templates.
Furthermore, code should be scrubbed from excessive and unnecessary #includes.
This seems like the main action item remaining. As discussed above, include-what-you-use
is a good tool for this, so I would consider evaluating and either implementing or rejecting that as the only thing left to do here. Since clang compilation is the bottleneck here, I propose that we just get our C/C++ source files working first. Once we have systematic approaches in place for that and are regularly checking on the quality, we can incrementally work up to getting CUDA source files working since that is a much heavier lift (and at that point we can also split between host and device code in cu files). This is similar to the approach we are taking in #16958 for clang-tidy.
Feature request
As anyone who has built libgdf recently has surely noticed, the time to compile the library from scratch has grown significantly in the last few months. For example, compiling on all 10 cores of my i9-7900X @ 3.30GHz takes 11 minutes as reported by
time make -j
.Compiling with
-ftime-report
may be a good place to start to see where all the compilation time is being spent.This is likely due to the large amount of template instantiation that is required for instantiating functions for all possible types. We should make sure that best practices are being followed in template instantiation such that a template for a given type is only having to be instantiated once via explicit instantiation.
Much of our code is implemented in headers, which causes it to be recompiled everywhere that header is included. Using pre-compiled headers may help: https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html http://itscompiling.eu/2017/01/12/precompiled-headers-cpp-compilation/
Furthermore, code should be scrubbed from excessive and unnecessary
#include
s. Compiling with-MD
will show what files are being includedHere's a Clang tool that ensures you only include the necessary headers: https://github.com/include-what-you-use/include-what-you-use
Here's a Clang tool to profile time spent in template instantiation: https://github.com/mikael-s-persson/templight