Closed maxime-petitjean closed 2 years ago
If I try to execute this code:
const { DataFrame } = require('@rapidsai/cudf'); const frame = DataFrame.readParquet({ sourceType: 'files', sources: ['data.parquet'] }); const result = frame.sum(); // throw!
I have the error sum operation requires dataframe to be entirely of dtype FloatingPoint OR Integral. but parquet file contains only Float64 columns.
sum operation requires dataframe to be entirely of dtype FloatingPoint OR Integral.
Float64
If I explicitly cast columns to Float64, it's working!
const { DataFrame, Float64 } = require('@rapidsai/cudf'); const frame = DataFrame.readParquet({ sourceType: 'files', sources: ['data.parquet'] }); const casted = frame.cast({ col1: new Float64(), col2: new Float64() }); const result = casted.sum(); // OK
If I log frame types I get:
{ col1: { typeId: 3, precision: 2 }, col2: { typeId: 3, precision: 2 } }
{ col1: Float64 [Float] { precision: 2 }, col2: Float64 [Float] { precision: 2 } }
Instance type of column type seems to be lost in readParquet function (type serialisation?).
@maxime-petitjean thanks for the bug report! That sounds like we're not fixing the types coming from C++ after loading the parquet file. I'll make a PR real quick with a fix.
If I try to execute this code:
I have the error
sum operation requires dataframe to be entirely of dtype FloatingPoint OR Integral.
but parquet file contains onlyFloat64
columns.If I explicitly cast columns to Float64, it's working!
If I log frame types I get:
{ col1: { typeId: 3, precision: 2 }, col2: { typeId: 3, precision: 2 } }
{ col1: Float64 [Float] { precision: 2 }, col2: Float64 [Float] { precision: 2 } }
Instance type of column type seems to be lost in readParquet function (type serialisation?).