Closed mumrah closed 7 years ago
Fixed. This affects the CSV parser only when processing unquoted values.
The behavior just got inconsistent after the latest optimization for version 2.2.1
, and you won't get OutOfMemoryError
: the length of each parsed String
will be limited to the internal buffer size in the worst case.
Values that exceed the buffer length - or partially stored in the current buffer - are parsed using the original algorithm and the maximum length restriction will be applied.
I've just released a 2.2.2-SNAPSHOT
version to include the fix for this.
I haven't narrowed this down yet, but as of 2.2.1 I'm seeing strange behavior when using maxCharsPerColumn.
Here is a unit test that isolates the problem (along with one of our test files):
This outputs
I would expect it to fail on the first line.
I traced through things a bit in the Univocity classes and noticed that AbstractCharInputReader is creating a 4096
ExpandingCharAppender
("tmp"). Don't know enough about how things are working in there, but it seems possible that stuff is being read from a buffer which doesn't have the 100 char limit (as set in my test).For my purposes, I'm mostly interested in preventing OOM when a user mis-configures the parser. Since it does seem to eventually use the correct reader and properly fail, I'll just update my test to workaround this for now.
Thanks!