Closed galipremsagar closed 3 years ago
@galipremsagar I believe this is a feature request. Currently CSV reader and writer don't support categorical data, AFAIK.
As a stopgap from the Python side we could materialize the dictionary encoded column before writing the CSV.
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
This has been implemented, likely in in https://github.com/rapidsai/cudf/pull/6829 . Closing
import cudf
import pandas as pd
pdf = pd.DataFrame()
pdf['a'] = pd.Series(['dsfdsfsdf','sdfsdfsdf','sdfsdfsdfs','sdfsdfsdf'], dtype='category')
gdf = cudf.from_pandas(pdf)
print(pdf.to_csv())
print(gdf.to_csv())
,a
0,dsfdsfsdf
1,sdfsdfsdf
2,sdfsdfsdfs
3,sdfsdfsdf
,a
0,dsfdsfsdf
1,sdfsdfsdf
2,sdfsdfsdfs
3,sdfsdfsdf
Describe the bug When there is a categorical column in the dataframe, csv writer is returning the code instead of the actual categorical data.
Steps/Code to reproduce bug
Expected behavior Categorical data should be returned instead of codes.
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context Surfaced while running fuzz tests: #6001