Describe the bug
The data written to disk during to_parquet is incorrect if the method is called on a sliced DataFrame (e.g df.iloc[10:]) containg string data. For now, the only workaround is to explicity copy any data that needs to be written to disk.
Describe the bug The data written to disk during
to_parquet
is incorrect if the method is called on a sliced DataFrame (e.gdf.iloc[10:]
) containg string data. For now, the only workaround is to explicity copy any data that needs to be written to disk.Steps/Code to reproduce bug
Trace:
Expected behavior Ideally, the data view should be written to disk as expected.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context The need for an explicit copy of the data view may affect NVTabular perfomance cc @benfred