Closed pmattione-nvidia closed 2 months ago
I believe the scale is zero if you use a fixed-point type with fixed_width_column_wrapper
What is the unexpected behavior?
Fixed-point types are considered fixed-width https://github.com/rapidsai/cudf/blob/cdfb550f442e846623c721082128a095f02efff9/cpp/include/cudf/utilities/traits.hpp#L514-L520
Also, there are existing tests that take advantage of this. Here are a few: https://github.com/rapidsai/cudf/blob/cdfb550f442e846623c721082128a095f02efff9/cpp/tests/streams/io/orc_test.cpp#L63 https://github.com/rapidsai/cudf/blob/cdfb550f442e846623c721082128a095f02efff9/cpp/tests/streams/io/parquet_test.cpp#L65 https://github.com/rapidsai/cudf/blob/cdfb550f442e846623c721082128a095f02efff9/cpp/tests/streams/io/csv_test.cpp#L55
As well as other tests that use the FixedWidthTypes
type-list for building tests over all fixed-width types.
https://github.com/rapidsai/cudf/blob/cdfb550f442e846623c721082128a095f02efff9/cpp/include/cudf_test/type_lists.hpp#L304
The problem is the scale being zero, even if you construct it with fixed_point data with a different scale factor. E.g. in the example I posted, you are storing a decimal with scale -1 in a column with scale zero. Then whenever you do anything with the column it behaves unexpectedly. E.g in 24.04, this comparison fails, because the scale factor is effectively lost:
cudf::test::fixed_width_column_wrapper<double> double_col({1.0});
cudf::test::fixed_width_column_wrapper<numeric::decimal128> decimal128_col({numeric::decimal128{1.0, numeric::scale_type{-1}}});
auto result = binary_operation(
double_col,
decimal128_col,
cudf::binary_operator::EQUAL,
cudf::data_type(cudf::type_id::BOOL8));
auto expected = column_wrapper<bool>{true};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, *result.get());
And it's even more dangerous, as you can have the column wrapper store a list of decimals that all have different scale factors. Then it's really not clear what the behavior even should be. The scale factor needs to solely be a property of the column, and not be allowed for the data stored within it. We should just have people use fixed_point_column_wrapper instead.
Ok, that makes sense. The convenience of the FixedWidthTypes
makes supporting fixed-point types in the fixed_width_column_wrapper
compelling however.
I did some investigation and found there are several places in our gtests that are actually using this incorrectly as described in this issue so it is certainly worth fixing. They currently are not failing because they are not checking the scale-type value.
I think we can continue to support this feature but only allow scale=0
since there is no way to set it in the fixed_width_column_wrapper
. I'll open a PR that checks for invalid usage and fixes the incorrect ones we currently have.
If someone creates a fixed_width_column_wrapper
of decimal, is there a way that we can prevent a user from accidentally changing the value of the scales of the decimals stored within it?
If someone creates a
fixed_width_column_wrapper
of decimal, is there a way that we can prevent a user from accidentally changing the value of the scales of the decimals stored within it?
I'm not sure I'm following. The values are copied from host to device within the wrapper ctor so changing the original host values will not change the copied device values. Maybe I need to see an example of what you are asking.
Ah I see, so maybe it isn't possible then.
Describe the bug fixed_width_column_wrapper doesn't restrict the types allowed for the column. This can lead to unexpected behavior when a user chooses to use (e.g.) decimal32 for the "ElementTo" type, and the scale information is not embedded in the type. The desired behavior is that the user uses fixed_point_column_wrapper to wrap decimal data, so we should prevent the usage of fixed_width_column_wrapper for fixed-point.
Steps/Code to reproduce bug
cudf::test::fixed_width_column_wrapper<numeric::decimal128> decimal128_col({numeric::decimal128{1.0, numeric::scale_type{-1}}});