Open galipremsagar opened 3 years ago
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This is because fastparquet writer converts the timedelta to a time64
type which we don't support.
Besides, this is a bug in fastparquet because timedelta64[ns]
should convert to a duration[ns]
type.
@galipremsagar ok to close ?
@quasiben Will raise an issue on fastparquet and close it here.
@galipremsagar should this be closed?
@devavret Could you provide what exactly is broken with fastparquet
with a minimal reproducer so that we can raise an issue on their repo? or is this an incompatibility of types across cudf & fastparquet? But seems like pandas is handling the types correctly:
In [8]: pd.read_parquet("temp_file", engine='fastparquet')
Out[8]:
a
0 0 days 00:03:54.334353
1 0 days 01:12:34.353455
2 0 days 00:00:54.546344
In [9]: pd.read_parquet("temp_file", engine='fastparquet').dtypes
Out[9]:
a timedelta64[ns]
dtype: object
Should cudf be doing the same kind of handling?
@galipremsagar should this be closed?
Not yet, will need to know the above information
I don't remember the details anymore but looks like the problem was in the fastparquet
writer, not the reader.
Describe the bug When we save a dataframe containing a series with duration type using
fastparquet
engine and trying to load the same parquet file works in pandas but not in cudf.Steps/Code to reproduce bug
Expected behavior To not error and load the dataframe, like in
pd.read_parquet
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context Surfaced while running fuzz tests in #6001