uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
918 stars 252 forks source link

Is it possible that openCsv Csv parser version 3.8 faster then univocity Csv parser version 2.8.2 #331

Open ArsenyClean opened 5 years ago

ArsenyClean commented 5 years ago

Hi! I wrote a little test, using openCsv parser, that parse 200mb file for 5.5 sec. when the univocity Csv parser parse the same file about 24.2 sec. Can it be truth?

this is a test

` public class CsvCompare { @Test public void compareTest() { List lines = null; try { lines = IOUtils.readLines(getClass().getClassLoader().getResourceAsStream("test.csv")); } catch (IOException e) { throw new RuntimeException(e); } Parser openCsvParser = new Parser() { CSVParser csvParser = new CSVParser(';'); public String[] parseLine(String line) { try { return csvParser.parseLine(line); } catch (Exception e) { e.printStackTrace(); errorLinesCounter++; return new String[0]; } } };

    final CsvParserSettings settings = new CsvParserSettings();
    settings.getFormat().setDelimiter(';');

    Parser univocityParser = new Parser() {
        CsvParser univocityParser = new CsvParser(settings);
        public String[] parseLine(String line) {
            try {
                return univocityParser.parseLine(line);
            } catch (Exception e) {
                e.printStackTrace();
                errorLinesCounter++;
                return new String[0];
            }
        }
    };

    Long openCsvTime = calcResultTime(openCsvParser, lines);

    Long univocityTime = calcResultTime(univocityParser, lines);

    Assert.assertEquals(openCsvParser.errorLinesCounter, 0);
    Assert.assertEquals(univocityParser.errorLinesCounter, 0);
    Assert.assertTrue(openCsvTime > univocityTime);
    System.out.println("Ok");
}

private Long calcResultTime(Parser parser, List<String> lines) {
    List<String[]> resultRaws = new ArrayList<String[]>();
    long time = System.currentTimeMillis();
    for (String line : lines) {
        resultRaws.add(parser.parseLine(line));
    }
    return System.currentTimeMillis() - time;
}

public abstract class Parser {
    public int errorLinesCounter = 0;
    public abstract String[] parseLine(String line);
}

} `

This test will failed

I cant explain for my own how can it be

Maybe i dont set some special settings?

jbax commented 5 years ago

You are reading lines in memory for no reason withIOUtils.readLines(getClass().getClassLoader().getResourceAsStream("test.csv"));. Let the parser extract individual rows for you. Also this will break if your test.csv file has fields that contain one or more line separators.

I never ran a performance analysis on parseLine(String); but I'll take a look. Thank you for letting me know it's not performing as fast as it should.