uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917 stars 252 forks source link

Method parsedHeaders() is not honoring ignoreLeadingWhitespaces and ignoreTrailingWhitespaces anymore #403

Closed adutra closed 4 years ago

adutra commented 4 years ago

With 2.7.6 the code below prints true, but with 2.8.4 it prints false. IOW parsedHeaders() is not honoring ignoreLeadingWhitespaces and ignoreTrailingWhitespaces anymore:

CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
settings.setIgnoreLeadingWhitespaces(true);
settings.setIgnoreTrailingWhitespaces(true);
CsvParser parser = new CsvParser(settings);
parser.beginParsing(new StringReader(" field1 , field2 \na,b"));
String[] parsedHeaders = parser.getContext().parsedHeaders();
String[] headers = parser.getContext().headers();
System.out.println(Arrays.equals(parsedHeaders, headers));
jbax commented 4 years ago

That's the expected behavior now as the parsedHeaders will bring the original content extracted from the input without any modification. This is required to support the handling of tricky structures such as inputs with headers like: "header1"," header1" where the whitespace is the only way to discriminate one from the other - but you still want to trim all the values that come after. Believe it or not this sort of thing exists in production systems out there and I myself had to deal with this.

adutra commented 4 years ago

@jbax thanks for the explanation! Maybe you should stress this behavioral change in the javadocs? It took me a while to figure out why our tests were failing after the upgrade to 2.8.