rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.31k stars 887 forks source link

[BUG] FLOAT32 rounding more inaccurate than necessary #14528

Open jlowe opened 10 months ago

jlowe commented 10 months ago

Describe the bug When rounding single-precision floats to a specified number of decimal places, the result can be slightly inaccurate due to the intermediate computations being forced into FLOAT32 as well. round.cu has rounding functors for non-fixed-point types, but all of the intermediate results are in the same type as the input rather than the highest precision type, double. This means more error is introduced during the rounding computation than is necessary.

Steps/Code to reproduce bug The following code demonstrates the problem:

#include <cudf/round.hpp>
#include <cudf_test/column_utilities.hpp>
#include <cudf_test/column_wrapper.hpp>
#include <cudf_test/debug_utilities.hpp>

int main(int argc, char** argv) {
  auto const input =
    cudf::test::fixed_width_column_wrapper<float>{6.121944898040965e-05f};
  cudf::test::print(input);
  auto const result = cudf::round(input, 10, cudf::rounding_method::HALF_UP);
  cudf::test::print(*result);
  return 0;
}

Rounding the value to the tenth decimal place should round down to approximately 6.12194e-05 but instead the value is rounded up to approximately 6.12195e-05 as shown in the output when running the program:

6.1219449e-05
6.12194999e-05

Expected behavior FLOAT32 rounding should use FLOAT64 for intermediate results during computation to try to avoid injecting errors beyond what is necessary when dealing with floating point numbers. When I manually performed the computations on this example input value for round.cu's half_up_positive logic but using double instead of float for the intermediate values, the answer came out rounded down as expected.

It seems that the functors for floating point rounding in round.cu should not be using whatever the input type is but rather double explicitly to avoid unnecessary additional error during the computation.

bdice commented 10 months ago

Looking at the implementation of our rounding code (specifically half_up_positive), I am concerned that it's not just a matter of float becoming double. I suspect double values could see the same type of problem, if the number of decimal places is large enough. I think we may need to use an algorithm that is more numerically stable and has less precision loss due to truncation. See also #14169, which is a separate problem with rounding algorithms in libcudf affecting float-to-decimal conversions.

jlowe commented 10 months ago

Yes, totally agree that double will have similar issues with larger decimal place rounding. As such I don't see this so much as "solving" all the rounding issues as significantly improving the results for FLOAT32 with minimal effort.

wence- commented 10 months ago

I think this might be possible to handle by using appropriately selected CUDA intrinsics for the division step. If we're rounding half-up we should use division with round to positive infinity, rather than round to nearest ties to even (which is the default):

Edit: I misunderstood the definition of half_up rounding, which only breaks ties. Though one might still be able to get away with something like this

#include <cstdio>
#include <cuda_runtime_api.h>
__global__ void test(float val, float *out) {
  float decimals = std::pow(10, 10);
  float ipart;
  float remainder = modff(val, &ipart);
  float scaled = roundf(remainder * decimals);
  out[0] = ipart + __fdiv_ru(scaled, decimals);
  out[1] = ipart + __fdiv_rn(scaled, decimals);
}

int main(void) {
  float input = 6.121944898040965e-05f;
  float *val;
  cudaMalloc(&val, 2 * sizeof(float));
  test<<<1, 1, 1>>>(input, val);
  float h_val[2];
  cudaMemcpy(&h_val, val, 2*sizeof(float), cudaMemcpyDeviceToHost);
  printf("in:      %.20f\n", input);
  printf("out(ru): %.20f\n", h_val[0]);
  printf("out(rn): %.20f\n", h_val[1]);
  cudaFree(val);
}

Produces:

in:      0.00006121944898040965
out(ru): 0.00006121950718807057
out(rn): 0.00006121949991211295

But I haven't thought through all the consequences of this change.

wence- commented 10 months ago

That said, I claim that half_up rounding doesn't make much sense as tiebreaking scheme for binary floating point values since the only fractional floating point values that exactly end in 5 are those of the form $2^{-n}$ for $n \ge 1$.

pmattione-nvidia commented 8 months ago

I started looking into this, and noticed a problem with our current implementation: HALF_UP rounding is bugged for negative numbers:

if you round (e.g.) -0.5f to zero decimal places, it should round up to zero, but instead results in -1 because it calls roundf(), which rounds away from zero.

Do we want to change the rounding code to round up, or change the name/comment to ROUND_AWAY?

jlowe commented 8 months ago

I think the "UP" here refers to higher magnitude rather than higher value. FWIW Java's HALF_UP rounding has same round-away-from-zero semantics (see Javadocs here), so the HALF_UP semantics match what we want from the Spark perspective.

pmattione-nvidia commented 8 months ago

That's fine with me, I'll fix the code comment then in round.hpp (which should instead point to wikipedia's rounding half away from zero).

harrism commented 7 months ago

When rounding single-precision floats to a specified number of decimal places

The above is a nonsense statement, in my book. By now you should know my opinion on this: libcudf should not support round to decimal places on binary floating-point types.

https://github.com/rapidsai/cudf/issues/406 https://github.com/rapidsai/cudf/issues/1270 https://github.com/rapidsai/cudf/issues/1340 https://github.com/rapidsai/cudf/issues/7195

I'm sure there are more.

pmattione-nvidia commented 2 months ago

The most accurate way to round would be to convert floating-point to fixed_point (to more decimal places than float/doubles precision), round as desired, then convert that back to floating.

But as @harrism brings up this will still produce "unexpected" results (e.g. 1.151f to 2 decimal places -> fixed point 115 w/ scale -1 -> 1.1499999f -> string "1.14").

So I think the question is: do we want to do these conversions in this kernel, or should we block using floating point for the round kernel, and instead require users to explicitly choose the chain of operations above?

harrism commented 2 months ago

I think licudf shouldn't support rounding binary float types to specified number of decimal places. Just like the C++ standard provides no function for this.

https://en.cppreference.com/w/cpp/numeric/math/round

Computes the nearest integer value to num (in floating-point format)