Open biiiipy opened 1 day ago
Pyarrow doesn't support a char, but an invalid row handler callback. https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions
We could create that callable for pyarrow. Or we could raise an exception saying that we don't support that combination with pyarrow. Given that pyarrow goes back into python, this will have terrible performance and is not something we normally would accept. I think we should raise.
For reference, pandas also raises:
pd.read_csv("", comment="#", engine="pyarrow")
# ValueError: The 'comment' option is not supported with the 'pyarrow' engine
Checks
Reproducible example
test.csv:
this returns commented rows starting with '#', but it shouldn't:
Log output
Issue description
pyarrow doesn't have an option to define comment rows and skip them, so that complicates a fix for this
Expected behavior
read_csv should not return
#comment
row. read_csv should either warn/error if bothuse_pyarrow=True
andcomment_prefix
are used, or remove comment rows from the dataframe as an additional stepInstalled versions