Open eoghanmurray opened 7 years ago
I agree that the case of too many fields is a bug and should explicitly raise. As for cases with too few fields, I'm not sure I would agree. It's being interpreted as empty data (or NULL), which is perfectly acceptable.
Thanks! On the 'too few fields' point maybe the example I gave was lacking. Instead of two sets of widths of [2,1,1,1] and [2,1,1] , the problem would be when there are incompatible widths (& meanings), So e.g. [2,23,14,12,10] and [2,10,11,10]
(In the data I'm looking at, the difference between these two types is signified by the value of the first two characters of the line. Obv. different lines should have been exported to different files, but that is not the legacy data I'm working with.)
I tried the warn_bad_lines and error_bad_lines options - and received a warning that they are being deprecated:
>>> FutureWarning: The warn_bad_lines argument has been deprecated and will be removed in a future version
How can I read in a file (these are log files with set column widths) that may sometimes have badly formatted rows? I'd like to skip/delete these rows (but know that they exist) - because the other thousands of lines must be analyzed.
The warning tells you what to do:
FutureWarning: The warn_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
Thanks. This wasn't output in PyCharm 2021.3.1 I'll learn about on_bad_lines (which isn't documented in the read_fwf documentation)
I'm importing a fixed width file which has 2 types of records (each with their own definitions).
Problem description
I expected that lines not matching the passed-in spec would result in a 'bad line' error for that line, and those lines could be ignored.
Expected Output
I expected these lines to raise an error, with the
error_bad_lines
option available to ignore the lines and show warnings instead (which could be turned off withwarn_bad_lines
).Output of
pd.show_versions()