osiegmar / FastCSV

CSV library for Java that is fast, RFC-compliant and dependency-free.
https://fastcsv.org/
MIT License
542 stars 93 forks source link

Accept the Appendable interface for output. #64

Closed shollander closed 2 years ago

shollander commented 2 years ago

Proposed Changes

Accept the Appendable interface for output. This allows direct use of StringBuilder. For building in-memory CSV data, this simplifies code and avoids the synchronization penalty of StringWriter.

Original public method of CsvWriterBuilder.build(Writer) is retained to ensure binary compatibility with previous versions.

Performance

Performance seems to have improved, possibly due to the use of CharBuffer.

Before:

Benchmark                     Mode  Cnt        Score        Error  Units
FastCsvReadBenchmark.read    thrpt   10  1465555.200 ±  50337.767  ops/s
FastCsvWriteBenchmark.write  thrpt   10  1804108.276 ± 160022.167  ops/s

After:

Benchmark                     Mode  Cnt        Score       Error  Units
FastCsvReadBenchmark.read    thrpt   10  1448338.547 ± 28802.631  ops/s
FastCsvWriteBenchmark.write  thrpt   10  1149119.333 ± 44167.523  ops/s
osiegmar commented 2 years ago

Thanks for this proposal!

Performance seems to have improved, possibly due to the use of CharBuffer.

Your benchmarks shows the opposite – the performance decreased by around 60% (score 1,8 M to 1,1 M). This is not a surprise, as CharBuffer only wraps a char[] and method calls (CharBuffer#position and CharBuffer#put) are slower than direct array access.

shollander commented 2 years ago

My apologies, I misread the benchmark. I thought that the score was a duration. I rewrote the CachingWriter to use the original char[] buffer. However, I was unable to achieve the original performance. It appears that the Appendable interface is less efficient than the Writer interface. I even attempted to use two different CachingWriter implementations, depending on whether the target implements Writer or not. Unfortunately, I think my original idea was misguided. It is still faster to use a Writer, such as StringWriter, than to use an Appendable, such as the unsynchronized StringBuilder. It appears that Writer is more efficient at consuming input than Appendable. Therefore, I am going to drop this proposal.

As an aside, I did notice that this project has issues building on systems where the default character encoding is not UTF-8 (i.e. Windows). In order to get this to build on Windows I needed to add a gradle.properties file with org.gradle.jvmargs=-Dfile.encoding=UTF-8. You might want to consider adding that to the project to make it easy for all users to contribute.