uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
918 stars 251 forks source link

Uneven amount of quotes causes missing cells due to escaped tab character #473

Open Bios-Marcel opened 3 years ago

Bios-Marcel commented 3 years ago

These

Test    Test    Test
Test    "   Test
Test    Test    Test
Test    """ Test
Test    Test    Test
Test    """""   Test

will result in:

[Test,Test,Test]
[Test,TRAILING QUOTES DEPENDING ON AMOUNT OF QUOTES\tTest]

It can be reproduced via the following code:

import java.io.IOException;
import java.io.StringReader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

import com.univocity.parsers.common.ParsingContext;
import com.univocity.parsers.common.processor.AbstractRowProcessor;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;

public class CSV
{
  public static void main( final String[] args ) throws IOException
  {
    final String stringData = new String( Files.readAllBytes( Paths.get( "data.csv" ) ) );
    final String[][] data = parseCSV( stringData );
    System.out.println( data );
  }

  private static String[][] parseCSV( final String rawData )
  {
    final List<String[]> rows = new ArrayList<>();
    final CsvParserSettings settings = new CsvParserSettings();
    settings.detectFormatAutomatically( '\t', ';', ',' );
    settings.setIgnoreLeadingWhitespaces( false );
    settings.setIgnoreTrailingWhitespaces( false );
    settings.setSkipEmptyLines( false );

    //Ansonsten sind leere Zeilen null-values und führen zu Fehlern.
    settings.setNullValue( "" );

    settings.setProcessor( new AbstractRowProcessor()
    {
      @Override
      public void rowProcessed( final String[] row, final ParsingContext __ )
      {
        if ( row != null )
        {
          rows.add( row );
        }
      }
    } );

    final CsvParser parser = new CsvParser( settings );
    try ( StringReader reader = new StringReader( rawData ) )
    {
      parser.parse( reader );
    }
    return rows.toArray( new String[rows.size()][] );
  }
}

I am using a self-built version of 2.9.2-SNAPSHOT, since the previous build uploaded to maven contained a typo. However, I didn't change the code.