uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917 stars 252 forks source link

implicit limitation on max column name length? #438

Closed CodingCat closed 3 years ago

CodingCat commented 3 years ago

Hi, we recently noticed a bug in Spark 3.x which depends on the latest version of univocity parser. Basically, we found that there is an implicit limitation on column name length in univocity(1024 chars by default). if you added a header longer than the limitation, you will get NPE (you could see the detailed analysis in the Spark PR)

in univocity code base, you could add the following unit test to reproduce (to get that NPE error mentioned in Spark PR)

+       @Test
+       public void testSuperLongHeader() {
+               CsvWriterSettings settings = new CsvWriterSettings();
+               settings.getFormat().setLineSeparator("\n");
+               StringBuffer sb = new StringBuffer();
+               for (int i = 0; i < 1025; i++) {
+                       sb.append("a");
+               }
+               settings.setHeaders(sb.toString());
+               StringWriter out = new StringWriter();
+
+               CsvWriter writer = new CsvWriter(out, settings);
+               writer.writeHeaders();
+               List<String> row = new ArrayList<String>();
+               row.add("value 1");
+               row.add("value 2");
+               writer.writeRow(row);
+               writer.close();
+
+               assertEquals(out.toString(), "value 1,value 2\n");
+       }

NPE:

java.lang.NullPointerException: null
    at com.univocity.parsers.common.AbstractWriter.submitRow(AbstractWriter.java:349)
    at com.univocity.parsers.common.AbstractWriter.writeHeaders(AbstractWriter.java:444)
    at com.univocity.parsers.common.AbstractWriter.writeHeaders(AbstractWriter.java:410)
    at com.univocity.parsers.csv.CsvWriterTest.testSuperLongHeader(CsvWriterTest.java:638)

our question is: is such a limitation intentionally added? or it is actually a bug?

cc @HyukjinKwon @viirya

CodingCat commented 3 years ago

cc @jbax

CodingCat commented 3 years ago

ping @jbax

jbax commented 3 years ago

Fixed and will release a 2.9.1-SNAPSHOT version soon which you can use to test and confirm it's working. Thank you!

HyukjinKwon commented 3 years ago

Awesome!