Open randerzander opened 2 years ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
With the recent changes to csv reader, the current behavior seems to be that we are returning an empty dataframe. To me, it seems like an okay behavior?:
In [4]: import pandas as pd
...:
...: with open('test.csv', 'w') as fp:
...: fp.write('')
...:
In [5]: cudf.read_csv("test.csv")
Out[5]:
Empty DataFrame
Columns: []
Index: []
@randerzander is this an agreeable behavior in this scenario? or do you think otherwise?
The above is still the status quo here:
>>> import pandas as pd
>>> import cudf
>>> with open("test.csv", "w") as fp:
... fp.write("")
...
0
>>> pd.read_csv("test.csv")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
self._engine = self._make_engine(f, self.engine)
File "/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
return mapping[engine](f, **self.options)
File "/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
>>> cudf.read_csv("test.csv")
Empty DataFrame
Columns: []
Index: []
Given the growth of cudf.pandas since this issue was first created, we probably want to aim for closer matching now.
It's embarrassingly common to accidentally produce empty "CSV" files, then for a downstream system to fail on attempting to read them.
If I'm trying to read an empty file with Pandas, I get a helpful error message indicating the problem.
When I try to read an empty CSV file from Dask-cudf (or cudf directly), it's not clear if I perhaps OOMed or some other non-input-file-related problem: