voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

[ENG-3951] Add a json reading benchmark #90

Closed alistaire47 closed 2 years ago

alistaire47 commented 2 years ago

Closes #81. Adds a benchmark to time arrow::read_json_arrow() and similar ndjson readers from other packages (jsonlite, ndjson, RcppSimdJson, and jsonify, though I disabled the jsonify for now due to instability). Added a writer to ensure_format() that uses jsonlite::stream_out(), which works, but is very slow (writing fanniemae_2016Q4 takes hours), so we should replace it with an arrow writer if we write one.

Tests are wimpy because this is all quite slow.

Also renames the output param in read_csv (which this is heavily based upon) to output_format, because I realized it was getting overridden in the JSON by the captured output.