Closed anna-geller closed 10 months ago
I guess that the issue is due to your file, for example if I remove the line 47 (iPhone X vs Makeup Transformation (Face ID TEST)), the error happen on another line later
I took, the first 50 lines, remove the one mention above and it works fine
@Skraye we still need a proper solution for it e.g. a property to decide what to do with bad lines
in pandas, there is a property "on_bad_lines":
on_bad_lines{‘error’, ‘warn’, ‘skip’} or Callable, default ‘error’
Specifies what to do upon encountering a bad line (a line with too many fields). Allowed values are :
'error', raise an Exception when a bad line is encountered.
'warn', raise a warning when a bad line is encountered and skip that line.
'skip', skip bad lines without raising or warning when they are encountered.
seems OK to do the same what pandas does - e.g. adding enum property onBadLines
with options ERROR, WARN or SKIP
Sound good to me! Should we also output the bad row with the error message to help the user debug ?
good idea, in ERROR
case yes 👍
if WARN
, perhaps only output all bad rows as one file in internal storage in case there are many bad rows (imagine a large file with many bad rows)
in SKIP
, no need to output bad rows
THis is indeed a bug and is fixed in the latest version of the CSV library we used.
However, the idea to offer a way to manage corrupted rows is a good idea, if we do this, we should do it for all serdes reader not only for CSV. Please open an issue describing it.
I'll close this issue with a fix so this file is correctly hanlded.
Feature description
Full stacktrace:
Reproducer:
reading in pandas works fine
"I reduced the csv file to only 500 rows, still the same error."