Open ahendriksen opened 1 year ago
Cross-link to RMM issue: https://github.com/rapidsai/rmm/issues/1222
Thanks for doing this comparison @ahendriksen!
Have you, by chance, compared the end-to-end runtimes before and after the change to using the spdlog compiled lib? I'm attaching two ninja_log files- one before and one after.
I haven't done any further analysis on these files other than to notice that the end-to-end compile time only seemed to go down by about 1.5mins. That being said, there's a couple stragglers that took quite some time to compile (ivf-flat for example) which don't yet have specializations so I think we can address those separately to reduce the compile times further.
Also attached are the patches for the changes to RAFT and RMM to get them to use spdlog's compiled binary.
Good point. I have analyzed your ninja logs and share results below.
Some caveats:
headers
and compiled
log were compiled twice. The analysis below is only for the last (second) build.As you point out, looking at total compile time is not always useful because of stragglers. Therefore, I have looked at the compile times per translation unit and the sum of the compile times per translation unit.
Summary of results:
All results: (python script to generate is included below)
Sum of compile times for compiled spdlog: 36580.8 seconds
Sum of compile times for header-only spdlog: 40334.0 seconds
Compile times for paths only found in headers (seconds):
CMakeFiles/CORE_TEST.dir/test/core/nvtx.cpp.o 15.7
CMakeFiles/CORE_TEST.dir/test/core/span.cpp.o 4.4
CMakeFiles/CORE_TEST.dir/test/core/math_device.cu.o 29.2
CMakeFiles/CORE_TEST.dir/test/core/operators_host.cpp.o 3.9
CMakeFiles/CORE_TEST.dir/test/core/interruptible.cu.o 24.1
CMakeFiles/CORE_TEST.dir/test/core/memory_type.cpp.o 1.9
CMakeFiles/CORE_TEST.dir/test/core/span.cu.o 28.2
CMakeFiles/CORE_TEST.dir/test/core/math_host.cpp.o 4.1
Comparison of compile times between headers and compiled:
path header (s) compiled (s) change (s) change (%)
CMakeFiles/CORE_TEST.dir/test/core/logger.cpp.o 17.7 5.2 -12.5 -70.6%
CMakeFiles/CORE_TEST.dir/test/test.cpp.o 18.7 5.8 -12.9 -69.1%
CMakeFiles/CORE_TEST.dir/test/core/handle.cpp.o 22.6 11.3 -11.3 -50.1%
CMakeFiles/UTILS_TEST.dir/test/util/cudart_utils.cpp.o 20.8 10.6 -10.2 -49.1%
CMakeFiles/UTILS_TEST.dir/test/util/pow2_utils.cu.o 23.0 12.8 -10.2 -44.4%
istance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_double.cu.o 28.6 16.9 -11.7 -40.9%
distance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_float.cu.o 27.4 16.3 -11.1 -40.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/svd.cu.o 45.6 27.8 -17.8 -39.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/axpy.cu.o 41.8 25.8 -16.0 -38.2%
CMakeFiles/UTILS_TEST.dir/test/core/seive.cu.o 21.4 13.3 -8.0 -37.7%
CMakeFiles/CORE_TEST.dir/test/core/mdspan_utils.cu.o 32.7 20.7 -11.9 -36.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmin.cu.o 42.9 27.5 -15.5 -36.1%
CMakeFiles/LABEL_TEST.dir/test/label/merge_labels.cu.o 40.6 26.3 -14.3 -35.2%
CMakeFiles/MATRIX_TEST.dir/test/matrix/reverse.cu.o 42.0 27.3 -14.7 -35.0%
CMakeFiles/CORE_TEST.dir/test/core/mdarray.cu.o 41.8 27.2 -14.6 -34.9%
distance/specializations/detail/l2_sqrt_unexpanded_double_double_double_int.cu.o 34.7 22.8 -12.0 -34.4%
istance/distance/specializations/detail/russel_rao_double_double_double_int.cu.o 35.8 23.6 -12.3 -34.2%
CMakeFiles/STATS_TEST.dir/test/stats/cov.cu.o 46.6 30.8 -15.9 -34.0%
LABEL_TEST 0.3 0.2 -0.1 -33.8%
CMakeFiles/STATS_TEST.dir/test/stats/rand_index.cu.o 36.0 23.9 -12.0 -33.5%
t_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_float.cu.o 27.1 18.1 -9.0 -33.4%
CMakeFiles/UTILS_TEST.dir/test/util/device_atomics.cu.o 25.2 16.8 -8.4 -33.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/divide.cu.o 41.7 27.8 -13.9 -33.3%
CMakeFiles/MATRIX_TEST.dir/test/sparse/spectral_matrix.cu.o 38.7 25.9 -12.8 -33.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmax.cu.o 44.8 29.9 -14.8 -33.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/multiply.cu.o 40.4 27.1 -13.3 -32.9%
CMakeFiles/LINALG_TEST.dir/test/linalg/strided_reduction.cu.o 40.0 26.9 -13.2 -32.9%
CMakeFiles/MATRIX_TEST.dir/test/matrix/diagonal.cu.o 43.4 29.3 -14.0 -32.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/dot.cu.o 41.1 27.8 -13.2 -32.3%
CMakeFiles/STATS_TEST.dir/test/stats/sum.cu.o 39.4 26.7 -12.6 -32.1%
rc/distance/distance/specializations/detail/kernels/gram_matrix_base_double.cu.o 35.8 24.3 -11.5 -32.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/cholesky_r1.cu.o 38.8 26.4 -12.4 -32.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map_then_reduce.cu.o 44.1 30.1 -14.1 -31.9%
CMakeFiles/STATS_TEST.dir/test/stats/mean_center.cu.o 47.7 32.5 -15.2 -31.9%
ance/distance/specializations/detail/l2_unexpanded_double_double_double_int.cu.o 35.7 24.4 -11.3 -31.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/subtract.cu.o 42.5 29.0 -13.4 -31.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/norm.cu.o 41.3 28.3 -13.0 -31.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/coalesced_reduction.cu.o 43.9 30.3 -13.7 -31.1%
_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_double.cu.o 26.9 18.6 -8.3 -31.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_l1.cu.o 47.7 33.0 -14.7 -30.8%
CMakeFiles/RANDOM_TEST.dir/test/random/permute.cu.o 49.1 34.0 -15.1 -30.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig_sel.cu.o 42.8 29.6 -13.2 -30.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/triangular.cu.o 40.9 28.3 -12.6 -30.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_coo.cu.o 39.7 27.5 -12.2 -30.7%
CMakeFiles/STATS_TEST.dir/test/stats/entropy.cu.o 39.6 27.5 -12.1 -30.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/spgemmi.cu.o 37.5 26.0 -11.4 -30.5%
CMakeFiles/SPARSE_TEST.dir/test/sparse/row_op.cu.o 40.9 28.5 -12.4 -30.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_uint8_t.cu.o 39.8 27.8 -12.1 -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/stddev.cu.o 43.7 30.4 -13.2 -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/mean.cu.o 40.3 28.1 -12.1 -30.1%
ir/src/distance/distance/specializations/detail/l1_double_double_double_int.cu.o 34.5 24.2 -10.3 -29.8%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_float.cu.o 37.9 26.7 -11.2 -29.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemm_layout.cu.o 44.4 31.3 -13.2 -29.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_canberra.cu.o 49.0 34.5 -14.4 -29.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemv.cu.o 40.4 28.5 -11.9 -29.5%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_jensen_shannon.cu.o 46.7 33.0 -13.7 -29.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig.cu.o 44.1 31.2 -12.9 -29.3%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_uint8_t.cu.o 36.7 26.0 -10.7 -29.3%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hamming.cu.o 49.6 35.1 -14.5 -29.2%
CMakeFiles/RANDOM_TEST.dir/test/random/make_blobs.cu.o 48.9 34.6 -14.3 -29.2%
CMakeFiles/LINALG_TEST.dir/test/linalg/transpose.cu.o 41.7 29.6 -12.1 -29.0%
istance/distance/specializations/detail/l2_unexpanded_float_float_float_int.cu.o 47.2 33.6 -13.6 -28.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_unexp.cu.o 49.4 35.2 -14.2 -28.7%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hellinger.cu.o 45.4 32.4 -13.0 -28.6%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_uint8_t.cu.o 36.6 26.2 -10.4 -28.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/ternary_op.cu.o 43.9 31.5 -12.4 -28.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/add.cu.o 46.1 33.0 -13.0 -28.3%
dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_double.cu.o 36.3 26.1 -10.3 -28.2%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_transpose.cu.o 39.1 28.1 -11.0 -28.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_rows_by_key.cu.o 41.9 30.2 -11.7 -28.0%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_int8_t.cu.o 36.9 26.6 -10.3 -28.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_exp.cu.o 47.6 34.4 -13.2 -27.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/sort.cu.o 42.9 31.1 -11.8 -27.4%
CMakeFiles/STATS_TEST.dir/test/stats/information_criterion.cu.o 39.6 28.8 -10.8 -27.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/degree.cu.o 38.7 28.2 -10.5 -27.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/math.cu.o 46.9 34.2 -12.7 -27.1%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_eucsqrt_exp.cu.o 47.1 34.4 -12.7 -27.0%
.dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_float.cu.o 45.8 33.4 -12.3 -26.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_russell_rao.cu.o 48.2 35.3 -13.0 -26.9%
tance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int64.cu.o 50.7 37.1 -13.6 -26.8%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_int.cu.o 50.0 36.7 -13.3 -26.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_float.cu.o 67.7 49.7 -18.0 -26.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_to_dense.cu.o 37.5 27.5 -9.9 -26.5%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_double.cu.o 66.5 48.9 -17.6 -26.5%
ance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int64.cu.o 38.0 27.9 -10.1 -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/add.cu.o 39.4 29.0 -10.4 -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/unary_op.cu.o 51.6 37.9 -13.6 -26.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_csr.cu.o 47.3 34.9 -12.4 -26.3%
stance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int.cu.o 36.6 27.0 -9.6 -26.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/reduce.cu.o 44.1 32.5 -11.6 -26.3%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/pairwise_distance.cu.o 30.0 22.1 -7.9 -26.2%
MakeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_float.cu.o 37.3 27.5 -9.8 -26.2%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_minkowski.cu.o 47.2 34.8 -12.3 -26.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/epsilon_neighborhood.cu.o 44.0 32.5 -11.5 -26.1%
istance/distance/specializations/detail/kernels/polynomial_kernel_float_int.cu.o 47.4 35.0 -12.3 -26.0%
ce/distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_int.cu.o 46.6 34.5 -12.1 -25.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_correlation.cu.o 45.3 33.6 -11.7 -25.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_row_slice.cu.o 36.4 27.0 -9.4 -25.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_kl_divergence.cu.o 47.6 35.4 -12.3 -25.8%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/dist_coo_spmv.cu.o 61.3 45.5 -15.8 -25.7%
CMakeFiles/STATS_TEST.dir/test/stats/kl_divergence.cu.o 36.5 27.1 -9.4 -25.7%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_float.cu.o 35.0 26.0 -9.0 -25.7%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_int8_t.cu.o 36.0 26.7 -9.2 -25.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/binary_op.cu.o 45.4 33.8 -11.6 -25.6%
istance/distance/specializations/detail/russel_rao_float_float_float_uint32.cu.o 41.1 30.6 -10.5 -25.5%
c/distance/distance/specializations/detail/russel_rao_float_float_float_int.cu.o 42.7 31.8 -10.8 -25.4%
CORE_TEST 0.7 0.5 -0.2 -25.1%
nce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_int8_t.cu.o 36.4 27.3 -9.1 -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/norm.cu.o 40.9 30.8 -10.2 -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/filter.cu.o 54.0 40.6 -13.3 -24.7%
CMakeFiles/STATS_TEST.dir/test/stats/weighted_mean.cu.o 52.2 39.3 -12.9 -24.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/matrix.cu.o 44.9 33.8 -11.0 -24.6%
akeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_double.cu.o 37.5 28.3 -9.2 -24.6%
CMakeFiles/CORE_TEST.dir/test/core/operators_device.cu.o 35.7 26.9 -8.8 -24.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_float.cu.o 36.0 27.2 -8.8 -24.5%
CMakeFiles/STATS_TEST.dir/test/stats/histogram.cu.o 39.2 29.6 -9.6 -24.5%
CMakeFiles/RANDOM_TEST.dir/test/random/rmat_rectangular_generator.cu.o 39.8 30.1 -9.7 -24.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/power.cu.o 42.1 31.9 -10.2 -24.2%
CMakeFiles/RANDOM_TEST.dir/test/random/multi_variable_gaussian.cu.o 50.8 38.6 -12.2 -24.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map.cu.o 51.6 39.3 -12.4 -24.0%
CMakeFiles/MATRIX_TEST.dir/test/matrix/gather.cu.o 51.2 38.9 -12.2 -23.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_chebyshev.cu.o 45.6 34.7 -10.9 -23.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/rsvd.cu.o 53.5 40.8 -12.7 -23.7%
CMakeFiles/RANDOM_TEST.dir/test/random/sample_without_replacement.cu.o 65.2 50.0 -15.1 -23.2%
CMakeFiles/STATS_TEST.dir/test/stats/dispersion.cu.o 43.3 33.3 -10.0 -23.1%
stance/distance/specializations/detail/l2_expanded_double_double_double_int.cu.o 46.1 35.5 -10.6 -23.0%
CMakeFiles/LABEL_TEST.dir/test/label/label.cu.o 42.2 32.5 -9.6 -22.9%
b.dir/src/distance/distance/specializations/detail/l1_float_float_float_int.cu.o 45.2 34.9 -10.3 -22.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_cos.cu.o 47.9 37.1 -10.9 -22.7%
e/distance/specializations/detail/l2_sqrt_expanded_double_double_double_int.cu.o 46.9 36.3 -10.6 -22.6%
CMakeFiles/STATS_TEST.dir/test/stats/contingencyMatrix.cu.o 63.0 48.8 -14.2 -22.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/mean_squared_error.cu.o 37.5 29.1 -8.4 -22.4%
CMakeFiles/STATS_TEST.dir/test/stats/adjusted_rand_index.cu.o 71.0 55.3 -15.7 -22.1%
CMakeFiles/STATS_TEST.dir/test/stats/minmax.cu.o 41.7 32.5 -9.2 -22.1%
stance/distance/specializations/detail/kernels/polynomial_kernel_double_int.cu.o 33.8 26.4 -7.3 -21.7%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_discrete.cu.o 50.7 39.6 -11.0 -21.7%
src/distance/distance/specializations/detail/kernels/gram_matrix_base_float.cu.o 44.2 34.6 -9.6 -21.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_cols_by_key.cu.o 48.3 37.9 -10.5 -21.6%
istance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int.cu.o 46.1 36.3 -9.8 -21.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/sqrt.cu.o 40.3 31.8 -8.5 -21.2%
distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_uint32.cu.o 43.4 34.3 -9.2 -21.1%
CMakeFiles/RANDOM_TEST.dir/test/random/rng.cu.o 62.2 49.1 -13.1 -21.0%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_deserialize.cu.o 53.6 42.3 -11.3 -21.0%
CMakeFiles/RANDOM_TEST.dir/test/random/make_regression.cu.o 54.4 43.0 -11.4 -20.9%
CMakeFiles/STATS_TEST.dir/test/stats/meanvar.cu.o 40.4 32.0 -8.4 -20.8%
ir/src/distance/distance/specializations/detail/l1_float_float_float_uint32.cu.o 45.0 35.9 -9.1 -20.3%
ance/distance/specializations/detail/l2_unexpanded_float_float_float_uint32.cu.o 43.0 34.4 -8.6 -19.9%
CMakeFiles/SOLVERS_TEST.dir/test/sparse/mst.cu.o 65.3 52.4 -12.9 -19.8%
CMakeFiles/STATS_TEST.dir/test/stats/trustworthiness.cu.o 78.1 62.7 -15.4 -19.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/slice.cu.o 38.6 31.1 -7.5 -19.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/columnSort.cu.o 57.7 46.4 -11.3 -19.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/faiss_mr.cu.o 56.7 45.7 -11.1 -19.5%
CMakeFiles/STATS_TEST.dir/test/stats/r2_score.cu.o 70.4 56.9 -13.5 -19.2%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_build.cu.o 156.9 126.9 -30.0 -19.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector.cu.o 69.5 56.3 -13.2 -19.0%
CMakeFiles/SOLVERS_TEST.dir/test/cluster/cluster_solvers_deprecated.cu.o 69.8 56.9 -13.0 -18.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_double.cu.o 39.9 32.7 -7.3 -18.2%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/fused_l2_min_arg.cu.o 37.1 30.4 -6.7 -18.2%
CMakeFiles/STATS_TEST.dir/test/stats/completeness_score.cu.o 65.8 53.9 -11.9 -18.1%
CMakeFiles/STATS_TEST.dir/test/stats/homogeneity_score.cu.o 63.2 51.8 -11.4 -18.0%
CMakeFiles/SOLVERS_TEST.dir/test/lap/lap.cu.o 51.8 42.6 -9.3 -17.9%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans.cu.o 136.6 112.3 -24.3 -17.8%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/haversine.cu.o 47.7 39.2 -8.5 -17.7%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_float.cu.o 40.0 33.0 -7.0 -17.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/symmetrize.cu.o 52.5 43.4 -9.1 -17.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/normalize.cu.o 84.0 69.5 -14.5 -17.3%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/cluster_solvers.cu.o 86.0 71.4 -14.6 -17.0%
nce_lib.dir/src/distance/distance/specializations/detail/hellinger_expanded.cu.o 62.8 52.2 -10.6 -16.9%
aft_distance_lib.dir/src/distance/distance/specializations/detail/chebyshev.cu.o 71.4 59.4 -12.0 -16.8%
CMakeFiles/STATS_TEST.dir/test/stats/v_measure.cu.o 60.2 50.5 -9.7 -16.2%
CMakeFiles/STATS_TEST.dir/test/stats/accuracy.cu.o 66.7 56.3 -10.4 -15.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/gram.cu.o 56.1 47.5 -8.7 -15.4%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_serialize.cu.o 45.1 38.4 -6.7 -14.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/fused_l2_nn.cu.o 75.3 64.2 -11.1 -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce.cu.o 68.3 58.2 -10.1 -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector_op.cu.o 70.5 60.2 -10.3 -14.6%
libraft_distance.so 1.4 1.2 -0.2 -14.3%
CMakeFiles/CORE_TEST.dir/test/core/numpy_serializer.cu.o 75.9 65.2 -10.7 -14.1%
nce_lib.dir/src/distance/distance/specializations/detail/hamming_unexpanded.cu.o 67.3 58.2 -9.2 -13.6%
CMakeFiles/STATS_TEST.dir/test/stats/mutual_info_score.cu.o 61.0 52.7 -8.3 -13.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/linewise_op.cu.o 80.1 69.3 -10.7 -13.4%
CMakeFiles/STATS_TEST.dir/test/stats/silhouette_score.cu.o 71.2 62.1 -9.1 -12.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_smem_lut.cu.o 265.5 233.6 -32.0 -12.0%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_2d.cu.o 110.0 97.0 -13.0 -11.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_basediff.cu.o 245.2 216.5 -28.7 -11.7%
nce/distance/specializations/detail/jensen_shannon_double_double_double_int.cu.o 217.1 192.0 -25.1 -11.6%
STATS_TEST 0.5 0.5 -0.1 -11.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_basediff.cu.o 274.3 243.3 -31.0 -11.3%
CMakeFiles/MATRIX_TEST.dir/test/matrix/select_k.cu.o 469.2 416.2 -53.0 -11.3%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/distance.cu.o 81.7 72.5 -9.2 -11.3%
CMakeFiles/STATS_TEST.dir/test/stats/regression_metrics.cu.o 67.6 60.3 -7.3 -10.8%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_3d.cu.o 116.7 104.2 -12.5 -10.7%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_basediff.cu.o 271.2 242.6 -28.7 -10.6%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans_balanced.cu.o 206.4 184.5 -21.8 -10.6%
stance/distance/specializations/detail/l2_expanded_float_float_float_uint32.cu.o 186.3 166.7 -19.7 -10.6%
e/distance/specializations/detail/l2_sqrt_expanded_float_float_float_uint32.cu.o 184.9 165.4 -19.5 -10.5%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/brute_force.cu.o 260.7 233.4 -27.3 -10.5%
t_distance_lib.dir/src/distance/distance/specializations/detail/correlation.cu.o 79.6 71.3 -8.3 -10.4%
NEIGHBORS_TEST 1.0 0.9 -0.1 -10.1%
/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_uint8_t_uint64_t.cu.o 1043.6 937.8 -105.8 -10.1%
istance/distance/specializations/detail/kl_divergence_float_float_float_int.cu.o 197.7 178.0 -19.7 -10.0%
/distance/distance/specializations/detail/l2_expanded_float_float_float_int.cu.o 190.1 172.0 -18.2 -9.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/knn.cu.o 367.5 332.7 -34.8 -9.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/norm.cu.o 74.8 67.7 -7.1 -9.5%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_half_fast.cu.o 259.8 235.8 -24.0 -9.3%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_build_index.cu.o 137.8 125.1 -12.7 -9.2%
ance/distance/specializations/detail/l2_sqrt_expanded_float_float_float_int.cu.o 180.8 164.3 -16.5 -9.1%
raft_distance_lib.dir/src/distance/distance/specializations/detail/canberra.cu.o 250.0 227.7 -22.3 -8.9%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ball_cover.cu.o 131.1 119.5 -11.6 -8.9%
ance/distance/specializations/detail/kl_divergence_double_double_double_int.cu.o 258.0 235.1 -22.8 -8.9%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_fast.cu.o 275.1 251.5 -23.6 -8.6%
stance/neighbors/specializations/detail/ivfpq_compute_similarity_float_fast.cu.o 253.1 232.1 -21.0 -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint32_t.cu.o 836.5 767.3 -69.2 -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint64_t.cu.o 964.1 888.0 -76.2 -7.9%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_smem_lut.cu.o 278.9 258.2 -20.7 -7.4%
Files/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_int.cu.o 360.4 334.9 -25.6 -7.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/refine.cu.o 163.3 152.2 -11.2 -6.8%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_false.cu.o 282.6 264.0 -18.6 -6.6%
ance/distance/specializations/detail/kl_divergence_float_float_float_uint32.cu.o 197.3 184.6 -12.8 -6.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_smem_lut.cu.o 299.8 280.6 -19.1 -6.4%
CLUSTER_TEST 0.4 0.3 -0.0 -6.3%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_smem_lut.cu.o 255.8 241.3 -14.4 -5.6%
s/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_int.cu.o 334.2 315.7 -18.6 -5.6%
CMakeFiles/UTILS_TEST.dir/test/util/bitonic_sort.cu.o 234.7 221.7 -13.0 -5.5%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_knn_query.cu.o 880.9 832.3 -48.6 -5.5%
nce/distance/specializations/detail/jensen_shannon_float_float_float_uint32.cu.o 200.0 189.4 -10.6 -5.3%
akeFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_true.cu.o 287.1 272.0 -15.2 -5.3%
akeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_uint8_t_uint64_t.cu.o 747.2 708.5 -38.7 -5.2%
DISTANCE_TEST 0.4 0.3 -0.0 -5.0%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/knn_graph.cu.o 134.6 128.1 -6.6 -4.9%
s/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_int8_t_uint64_t.cu.o 1034.0 983.8 -50.2 -4.9%
s/raft_distance_lib.dir/src/distance/distance/specializations/detail/cosine.cu.o 336.6 320.3 -16.3 -4.9%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_all_knn_query.cu.o 941.0 897.8 -43.2 -4.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_float.cu.o 517.6 494.4 -23.2 -4.5%
eFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_false.cu.o 276.6 264.2 -12.3 -4.5%
libraft_nn.so 0.3 0.3 -0.0 -4.3%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_3d.cu.o 168.3 161.3 -7.0 -4.1%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_true.cu.o 287.2 275.4 -11.8 -4.1%
iles/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_uint.cu.o 354.7 340.2 -14.5 -4.1%
ance/distance/specializations/detail/lp_unexpanded_double_double_double_int.cu.o 622.1 597.9 -24.2 -3.9%
CMakeFiles/install.util 0.3 0.2 -0.0 -3.8%
MakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_int8_t_uint64_t.cu.o 730.8 703.0 -27.8 -3.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_adj.cu.o 224.7 216.3 -8.4 -3.7%
/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_uint.cu.o 333.7 321.3 -12.4 -3.7%
ance/distance/specializations/detail/lp_unexpanded_float_float_float_uint32.cu.o 487.5 470.5 -17.0 -3.5%
ir/src/distance/neighbors/specializations/detail/ivfpq_search_float_int64_t.cu.o 951.0 918.2 -32.8 -3.4%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/linkage.cu.o 1166.6 1126.6 -40.0 -3.4%
stance/distance/specializations/detail/jensen_shannon_float_float_float_int.cu.o 387.6 375.7 -11.9 -3.1%
istance/distance/specializations/detail/lp_unexpanded_float_float_float_int.cu.o 482.4 469.2 -13.1 -2.7%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_fast.cu.o 279.4 272.1 -7.2 -2.6%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_basediff.cu.o 243.4 237.2 -6.1 -2.5%
CMakeFiles/SOLVERS_TEST.dir/test/linalg/eigen_solvers.cu.o 815.6 795.9 -19.7 -2.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_uint8_t.cu.o 811.9 792.4 -19.5 -2.4%
nce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_int8_t.cu.o 797.5 781.1 -16.4 -2.1%
LINALG_TEST 1.4 1.4 -0.0 -1.6%
es/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_float_uint64_t.cu.o 991.4 975.7 -15.7 -1.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint32_t.cu.o 617.6 607.8 -9.8 -1.6%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_2d.cu.o 165.6 163.1 -2.4 -1.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_int64_t.cu.o 732.0 725.8 -6.2 -0.8%
akeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/connect_components.cu.o 362.0 359.1 -2.9 -0.8%
SPARSE_DIST_TEST 0.1 0.1 -0.0 -0.7%
UTILS_TEST 0.1 0.1 +0.0 +0.0%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/fused_l2_knn.cu.o 911.6 915.8 +4.2 +0.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/selection.cu.o 389.8 392.1 +2.3 +0.6%
SOLVERS_TEST 0.1 0.1 +0.0 +0.7%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_flat.cu.o 1515.9 1533.0 +17.1 +1.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint64_t.cu.o 712.4 727.7 +15.3 +2.2%
CMakeFiles/UTILS_TEST.dir/test/util/integer_utils.cpp.o 2.3 2.3 +0.1 +2.5%
SPARSE_NEIGHBORS_TEST 0.2 0.2 +0.0 +7.1%
RANDOM_TEST 0.5 0.5 +0.0 +8.7%
MATRIX_TEST 0.4 0.5 +0.1 +13.3%
SPARSE_TEST 0.4 0.6 +0.2 +43.0%
from pathlib import Path
from collections import Counter
def parse_ninja_log(log_path):
text = Path(log_path).read_text()
start, end, mtime, path, cmd = list(zip(*[line.split("\t") for line in text.splitlines()[1:]]))
start = list(map(int, start))
end = list(map(int, end))
seconds = [(e - s) / 1000. for e, s in zip(end, start)]
mtime = list(map(int, mtime))
return dict(
start=start,
end=end,
seconds=seconds,
mtime=mtime,
path=path,
cmd=cmd
)
def discard_earlier_builds(d):
prev_end = 0
start_index = 0
# end must be monotonically increasing. If we find and end value that is
# lower than the end value on the previous row, we know that a new build has
# started.
for i, end in enumerate(d['end']):
if end < prev_end:
start_index = i
prev_end = end
return {k: v[start_index:] for k, v in d.items()}
def print_duplicates(d):
# d is a dict returned by parse_ninja_log
print(f" # {'path':<60} sec cmd hash sec other cmd hash")
dup_paths = sorted(set(p for p, count in Counter(d['path']).items() if count > 1))
for i, p in enumerate(dup_paths):
print(f"{i:3d} {p[-60:]:<60}: ", end="")
for p_other, sec, cmd in zip(d['path'], d['seconds'], d['cmd']):
if p == p_other:
print(f"{sec:6.1f} ({cmd})", end="")
print()
compiled = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_compiled")
headers = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_headers")
compiled = discard_earlier_builds(compiled)
headers = discard_earlier_builds(headers)
# Print sum of compile times of each translation unit:
print(f"Sum of compile times for compiled spdlog: {sum(compiled['seconds']):.1f} seconds")
print(f"Sum of compile times for header-only spdlog: {sum(headers['seconds']):.1f} seconds\n")
compiled_times = dict(zip(compiled['path'], compiled['seconds']))
headers_times = dict(zip(headers['path'], headers['seconds']))
print("Compile times for paths only found in headers (seconds):")
for p in set(headers['path']) - set(compiled['path']):
print(f"{p[-80:]:<80} {headers_times[p]:6.1f}")
# Compare compile time per path between compiled and headers:
results = [(path, headers_times[path], compiled_times[path]) for path in compiled_times.keys()]
# Add relative change as a percentage
results = [(p, hsec, csec, csec - hsec, 100. * (csec / hsec - 1)) for p, hsec, csec in results]
# Sort by relative change
results = sorted(results, key=lambda x: x[4])
# Print results
print("\nComparison of compile times between headers and compiled: ")
print(f"{'path':<70} header (s) compiled (s) change (s) change (%)")
for p, hsec, csec, diff, rel in results:
print(f"{p[-80:]:<80} {hsec:6.1f} {csec:6.1f} {diff:+5.1f} {rel:+4.1f}%")
I'm proposing that RMM allow the user to set whether the compiled or header-only spdlog target is used. I would honestly prefer if we just defaulted to compiled everywhere accept for users who "really" want fully header-only operation.
Thanks for looking into this Corey! I agree it is a good idea to consider using the precompiled spdlog library. If we go the precompiled route, would this require adding a runtime dependency on spdlog in the conda package as well? We currently do not seem to have a Conda dependency on spdlog.
Describe the bug
Including the
spdlog
headers is quite expensive. Just adding#include <spdlog/spdlog.h>
to an empty file adds 2.8 seconds to the compilation time. For the pairwise distance kernels, removing thespdlog
include can reduce compile times by 50%.Steps/Code to reproduce bug
Expected behavior A smaller increase in compile time. For context, including
<string>
adds on the order of 100ms to the compilation time:Additional context
RMM RMM also uses
spdlog
. In practice the compile time improvements will only be obtained when RMM also removes its spdlog dependency.Reason The reason that compilation takes much longer is that
spdlog
instantiates a bunch of templates in every translation unit when used as a header only library. This happens in pattern_formatter::handleflag, which is instantiated here. Just adding back thespdlog
header doubles the compile times ofcicc
(device side) and alsogcc
on the host side.Precompiled-library Another option is to not use
spdlog
as a header only library. The effect can be simulated by defining SPDLOG_COMPILED_LIB. When this is defined,spdlog
adds only 0.5 seconds: