Open ttnghia opened 2 weeks ago
I'm working on composing a solution for this. A question in the meantime: if the result under- or overflows, what behavior is needed to match spark-rapids?
E.g. for floating -> decimal, should we return one of 0 / INT_MIN / INT_MAX? for decimal -> floating do we set 0 / +-inf? Or do we null the field entirely?
The code in the new conversion PR has been modified so that the cuDF-specific code wraps around the core of the conversion algorithm. In spark-rapids-jni, we can similarly wrap around this core to perform the spark-specific rounding that we need.
This cuDF draft PR has the code for the spark-specific rounding that we need, wrapping around the core of the cuDF conversion code. This code is just in cuDF for my ease of testing, and should be migrated to spark-rapids-jni for full integration.
The cuDF conversion PR has been merged.
Currently, there not exist any accurate float-to-decimal conversion. The closest operation to it is
cudf::round
which can produce some results that are not correct all the time. As such, we have issues like https://github.com/NVIDIA/spark-rapids/issues/9682 and https://github.com/NVIDIA/spark-rapids/issues/10809.A new dedicated conversion code in https://github.com/rapidsai/cudf/pull/15905 supposes to add some special handling for float-decimal conversion. Unfortunately, it performs truncation instead of rounding. It should be great to support an optional flag to that code, allowing to do either truncation or rounding depending on the applications.