rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.04k stars 872 forks source link

[FEA] Implement a more accurate float to decimal conversion that supports rounding instead of truncation #16155

Open ttnghia opened 2 weeks ago

ttnghia commented 2 weeks ago

Currently, there not exist any accurate float-to-decimal conversion. The closest operation to it is cudf::round which can produce some results that are not correct all the time. As such, we have issues like https://github.com/NVIDIA/spark-rapids/issues/9682 and https://github.com/NVIDIA/spark-rapids/issues/10809.

A new dedicated conversion code in https://github.com/rapidsai/cudf/pull/15905 supposes to add some special handling for float-decimal conversion. Unfortunately, it performs truncation instead of rounding. It should be great to support an optional flag to that code, allowing to do either truncation or rounding depending on the applications.

pmattione-nvidia commented 2 weeks ago

I'm working on composing a solution for this. A question in the meantime: if the result under- or overflows, what behavior is needed to match spark-rapids?

E.g. for floating -> decimal, should we return one of 0 / INT_MIN / INT_MAX? for decimal -> floating do we set 0 / +-inf? Or do we null the field entirely?

pmattione-nvidia commented 1 week ago

The code in the new conversion PR has been modified so that the cuDF-specific code wraps around the core of the conversion algorithm. In spark-rapids-jni, we can similarly wrap around this core to perform the spark-specific rounding that we need.

This cuDF draft PR has the code for the spark-specific rounding that we need, wrapping around the core of the cuDF conversion code. This code is just in cuDF for my ease of testing, and should be migrated to spark-rapids-jni for full integration.

pmattione-nvidia commented 5 days ago

The cuDF conversion PR has been merged.