pydantic / jiter

Fast iterable JSON parser.
https://crates.io/crates/jiter
MIT License
182 stars 11 forks source link

Next skip #84

Closed samuelcolvin closed 4 months ago

samuelcolvin commented 4 months ago

Support skipping vales with Jiter, this is specifically for https://github.com/apache/datafusion/issues/7845.

The performance seems good, benchmarks show skipping values is significantly faster than not.

Running the query from https://github.com/apache/datafusion/issues/7845#issuecomment-2061121332, gives:

-- datafusion with next_value
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 17.2s

-- datafusion with next_skip
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 14.3s

-- datafusion using serde-json allocating keys to a Vec<String> then checking for the key
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 21.7s

-- datafusion return false - e.g. fastest case for a UDF
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 11.8s

-- datafusion like
SELECT count(*) FROM records where attributes like '%"size":%';
  -- 6165586 in 14.5s

-- duckdb
SELECT count(*) FROM read_parquet('file.parquet') where list_contains(json_keys(attributes), 'size')
  -- 6165586 in 14.2s
codspeed-hq[bot] commented 4 months ago

CodSpeed Performance Report

Merging #84 will not alter performance

Comparing next_skip (bf9c6d1) with main (0ab4dd4)

Summary

โœ… 59 untouched benchmarks

๐Ÿ†• 14 new benchmarks

Benchmarks breakdown

Benchmark main next_skip Change
๐Ÿ†• big_jiter_skip N/A 107.8 ms N/A
๐Ÿ†• bigints_array_jiter_skip N/A 500.4 ยตs N/A
๐Ÿ†• floats_array_jiter_skip N/A 573.2 ยตs N/A
๐Ÿ†• massive_ints_array_jiter_skip N/A 1.2 ms N/A
๐Ÿ†• medium_response_jiter_skip N/A 73.4 ยตs N/A
๐Ÿ†• pass1_jiter_skip N/A 53.4 ยตs N/A
๐Ÿ†• pass2_jiter_skip N/A 5.4 ยตs N/A
๐Ÿ†• sentence_jiter_skip N/A 6.5 ยตs N/A
๐Ÿ†• short_numbers_jiter_skip N/A 332.7 ยตs N/A
๐Ÿ†• string_array_jiter_skip N/A 37.1 ยตs N/A
๐Ÿ†• true_array_jiter_skip N/A 22.8 ยตs N/A
๐Ÿ†• true_object_jiter_skip N/A 56.2 ยตs N/A
๐Ÿ†• unicode_jiter_skip N/A 6.7 ยตs N/A
๐Ÿ†• x100_jiter_skip N/A 2.4 ยตs N/A