uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
918 stars 251 forks source link

[CsvParser] Failed to parse long text in a single table cell EVEN AFTER setting "CsvParserSettings.setMaxCharsPerColumn(-1);" #464

Open AndyBRoswell opened 3 years ago

AndyBRoswell commented 3 years ago

Description

I have a CSV file with some long text in single table cells: https://github.com/AndyBRoswell/xueqiu-discussion-labeler/blob/data-manipulator/save/xueqiu.csv

When I tried parsing this CSV file, uniVocity CSV Parser threw an exception which indicated "Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096)" at line 152 (index start at 1). And the parsing stopped.

However, after I executed the statement

ParserSettings.setMaxCharsPerColumn(-1);

in advance, I got the same TextParsingException:

Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (-1). 
Parser Configuration: CsvParserSettings:
    Auto configuration enabled=false
    Auto-closing enabled=true
    Autodetect column delimiter=false
    Autodetect quotes=false
    Column reordering enabled=true
    Delimiters for detection=null
    Empty value=null
    Escape unquoted values=false
    Header extraction enabled=null
    Headers=null
    Ignore leading whitespaces=true
    Ignore leading whitespaces in quotes=false
    Ignore trailing whitespaces=true
    Ignore trailing whitespaces in quotes=false
    Input buffer size=1048576
    Input reading on separate thread=true
    Keep escape sequences=false
    Keep quotes=false
    Length of content displayed on error=-1
    Line separator detection enabled=false
    Maximum number of characters per column=-1
    Maximum number of columns=512
    Normalize escaped line separators=true
    Null value=null
    Number of records to read=all
    Processor=none
    Restricting data in exceptions=false
    RowProcessor error handler=StorageAccessor$1@693fe6c9
    Selected fields=none
    Skip bits as whitespace=true
    Skip empty lines=true
    Unescaped quote handling=nullFormat configuration:
    CsvFormat:
        Comment character=#
        Field delimiter=`
        Line separator (normalized)=\n
        Line separator sequence=\r\n
        Quote character="
        Quote escape character="
        Quote escape escape character=null
Internal state when error was thrown: line=151, column=0, record=151, charIndex=59056, headers=

(More contents of the exception are omitted here.)

The only difference was that the "Maximum number of characters per column" property became -1. But this had no effect and the parser couldn't parse the same position yet.

Involved source code: https://github.com/AndyBRoswell/xueqiu-discussion-labeler/blob/data-manipulator/src/StorageAccessor.java (Line 141)

PS

How to let the parser continue parsing after throwing TextParsingException?

UPDATE

I have already solved this problem by using a local CsvParserSettings object. You can view the modification at the same links. Why does CsvParserSettings have no effect when it is a class attribute?