jameslittle230 / stork

🔎 Impossibly fast web search, made for static sites.
https://stork-search.net
Apache License 2.0
2.73k stars 56 forks source link

Internally compress index bytes #280

Closed jameslittle230 closed 1 year ago

jameslittle230 commented 2 years ago

This PR creates a new index serialization format that all new indexes are built with. A new configuration option lets users specify that they want the index data to be compressed (here using bzip2) when written to the output file.

This shrank the Federalist Papers index by a factor of 5x.

Reasons this shouldn't get merged yet:

codecov[bot] commented 2 years ago

Codecov Report

Merging #280 (d42905f) into master (eeaca67) will decrease coverage by 10.67%. The diff coverage is n/a.

@@             Coverage Diff             @@
##           master     #280       +/-   ##
===========================================
- Coverage   72.44%   61.77%   -10.68%     
===========================================
  Files          53       15       -38     
  Lines        2174      518     -1656     
  Branches      104      104               
===========================================
- Hits         1575      320     -1255     
+ Misses        598      197      -401     
  Partials        1        1               
Impacted Files Coverage Δ
stork-cli/src/clap.rs
stork-cli/src/main.rs
stork-lib/src/config/mod.rs
stork-lib/src/index_v2/mod.rs
stork-lib/src/index_v3/build/errors.rs
stork-lib/src/index_v3/build/fill_containers.rs
...rc/index_v3/build/fill_intermediate_entries/mod.rs
stork-lib/src/index_v3/build/fill_stems.rs
stork-lib/src/index_v3/build/mod.rs
stork-lib/src/index_v3/mod.rs
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c1463e3...d42905f. Read the comment docs.

github-actions[bot] commented 2 years ago

Benchmarks

BenchmarkBaselineContenderComparison
build/federalist229.5537213.12820.93× 🎉
federalist.st1125.456271.0750.24× 🎉
search/federalist/liberty1.94781.98421.02×
stork.js21.96121.881.0×
stork.wasm356.537651.1571.83× ⚠️

Baseline: de70fb01688725b7955aa8a48b4fda7ef8be7993; Comparison: d42905f4b278949c40a002508c550e5c7719e2dd

jameslittle230 commented 1 year ago

Closing due to staleness