Open aplavin opened 6 years ago
Indeed! Would be nice to parameterize how many rows you need to infer field positions.
Yes, this messes thing up quite often and columns where strings get longer towards the end of the data may get messed up and truncated.
Hi there,
first things first: Thanks a stack for outlining this important detail, it just saved our implementation ^1. 🌻
Provide a way to increase this number and/or a parameter like
infer_using_whole_file=False
. Would be nice to parameterize how many rows you need to infer field positions.
On this matter, we wanted to report that the infer_nrows
argument has apparently been added to read_fwf
, so, while it still does not satisfy the need for a infer_using_whole_file
flag, the original gist can be implemented like read_fwf(colspecs="infer", infer_nrows=100000)
now, which may be more convenient than using the monkey patch, depending on the scenario.
With kind regards, Andreas.
the need for a
infer_using_whole_file
flag
It looks like using infer_nrows=np.Infinity
works well, see https://github.com/earthobservations/wetterdienst/commit/a4b15125d.
As of now
read_fwf
infers the fields positions using only first 100 rows of the file, and this number is not easily modifiable. However, if there is a field with values for several rare objects only, it will be completely missed! So it would be great if pandas used much more rows by default (100 is quite a small number) - why not put something like 10000? Or at least provide a way to increase this number and/or a parameter likeinfer_using_whole_file=False
.If anyone finds this issue and needs an immediate solution - I personally use monkey-patching: