Open GregoryKimball opened 1 year ago
On branch-23.10
commit ad9fa501192
, I ran build.sh libcudf
with a 64-bit size type and identified the unique lines that threw compilation errors.
Dictionary errors
cudf/cpp/src/groupby/sort/group_single_pass_reduction_util.cuh(64): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(88): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/detail/aggregation/aggregation.cuh(346): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/sort/group_correlation.cu(87): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/hash/multi_pass_kernels.cuh(107): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(41): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
AtomicAdd errors
cudf/cpp/include/cudf/detail/copy_if_else.cuh(97): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/null_mask.cuh(108): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/groupby/sort/group_std.cu(153): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/bitmask/null_mask.cu(303): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/valid_if.cuh(68): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(274): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(375): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(207): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(204): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(264): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(266): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(281): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(209): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(283): error: no instance of overloaded function "atomicAdd" matches the argument list
Thrust
cudf/cpp/src/groupby/groupby.cu(269): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/copying/contiguous_split.cu(631): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/groupby/groupby.cu(304): error: no instance of overloaded function "std::all_of" matches the argument list
cudf/cpp/src/hash/md5_hash.cu(343): error: no instance of overloaded function "thrust::for_each" matches the argument list
cudf/cpp/include/cudf/lists/detail/scatter.cuh(245): error: no instance of overloaded function "thrust::sequence" matches the argument list
cudf/cpp/src/groupby/groupby.cu(313): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/filling/repeat.cu(124): error: no instance of overloaded function "thrust::upper_bound" matches the argument list
Device span errors
cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int, 18446744073709551615UL>]" matches the argument list
cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int32_t, 18446744073709551615UL>]" matches the argument list
int
typing errors
cudf/cpp/src/binaryop/compiled/binary_ops.cuh(272): error: no instance of function template "cudf::util::div_rounding_up_safe" matches the argument list
cudf/cpp/include/cudf/detail/utilities/cuda.cuh(169): error: no instance of overloaded function "std::clamp" matches the argument list
Assorted errors
cudf/cpp/src/hash/spark_murmurhash3_x86_32.cu(230): error: no instance of overloaded function "std::max" matches the argument list
cudf/cpp/src/copying/purge_nonempty_nulls.cu(93): error: no instance of function template "cudf::detail::gather" matches the argument list
cudf/cpp/include/cudf/detail/copy_if.cuh(166): error: more than one instance of overloaded function "min" matches the argument list:
The java code right now hard codes a signed 32-bits as the size type in many places. We can switch it to 64-bits everywhere along with a dynamic check depending on how the code is compiled. But also just so you are aware Spark has a top level limitation of a singed 32-bit int for the number of rows in a table. We can work around this in some places, but moving the Spark plugin over to a 64-bit index is not going to be super simple.
Many libcudf users have expressed interest in using a 64-bit size type (see #3958 for reference). The
cudf::size_type
uses aint32_t
data type that limits the number of elements in libcudf columns toINT_MAX
(2.1 billion) elements. For string columns this imposes a ~2 GB limit, for int32 columns this imposes a ~8 GB limit, and for list columns this imposes a leaf element count <2.1 billion. Downstream libraries must partition their data to avoid these limits.We expect that using a 64-bit size type will incur significant penalties to memory footprint and data throughput. Memory footprint will double for all offset vectors, and runtime of most functions will increase due to the larger data sizes. Kernel performance may degrade even further due to increased register count and unoptimized shared memory usage.
As GPUs increase in memory, the limit from a 32-bit
cudf::size_type
will force data partitions to become smaller fractions of device memory. Excessive data partitioning also leads to performance penalties, so libcudf should enable its community to start experimenting with a 64-bit size type. Scoping for 64-bit size types in the cuDF-python layer will be tracked in a separate issue (#TBD).copy_if
,reduce
,parallel_for
,merge
andsort
may have unresolved issues.cudf::size_type
From this stage we will have a better sense of the impact and value of using a 64-bit size type with libcudf.