Open inejc opened 5 years ago
@inejc feel free to peak routines and tricks from the jsoniter-scala-core module.
Here are results of benchmarks for estimation of possible throughput and allocations.
@plokhotnyuk thanks for the pointers! I will look at your solution. Are you perhaps aware of any existing and efficient CSV loading libraries on JVM?
There are a lot of solutions for Java: https://github.com/uniVocity/csv-parsers-comparison
But a custom codec which is based on jsoniter-scala-core outperforms them greatly when numbers and strings are represented as JSON values. That require wrapping all string values by "
characters and using UTF-8 encoding or hexadecimal escaping for non-ASCII characters, and not using numbers with leading zeroes.
If implementation that is locked to JSON representation for string and numbers is not acceptable you can fork and replace it by other for other rules and encoding formats using the same approaches and hacks.
I merged https://github.com/picnicml/doddle-model/pull/106 but keeping this issue open as we want to improve the current solution. Preferably look into the examples given by @plokhotnyuk.
The current implementation is very slow, I think a better approach would be to implement a custom solution rather than using a third-party library.