rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.47k stars 908 forks source link

[FEA] Do not convert decimal32/64 cols to decimal128 in `to_arrow` API and PQ writer when arrow schema is in use #17080

Open mhaseeb123 opened 1 month ago

mhaseeb123 commented 1 month ago

Is your feature request related to a problem? Please describe. We currently convert decimal32 and decimal64 columns to decimal128 using #16236 whenever converting to arrow table via to_arrow or to_parquet with store_schema=True. https://github.com/apache/arrow/issues/43956 (when completes) will the add the support for decimal32 and decimal64 types in arrow as well making it unnecessary to to the conversion at libcudf side and hence should be removed.

Describe the solution you'd like Remove the conversion from d32 and d64 cols to d128 and directly write parquet or convert to_arrow

Describe alternatives you've considered Keep converting to d128 but not needed soon.

Additional context Blocked on the completion of https://github.com/apache/arrow/issues/43956

zeroshade commented 1 month ago

With the release of Arrow v18, i'll add implementations of Decimal32/Decimal64 to nanoarrow which will then allow avoiding the conversions.