sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.61k stars 2.13k forks source link

Optimization: use minimal-whitespace JSON separators in search index #13061

Closed jayaddison closed 3 weeks ago

jayaddison commented 1 month ago

Is your feature request related to a problem? Please describe. While trying to find out about something mostly unrelated in the Python JSON encoding documentation, I realized: we don't use the minimal whitespace representation when dumping our search index to JSON.

Describe the solution you'd like Configure the documented minimal-whitespace separators when serializing the JSON searchindex.

For the self-built Sphinx documentation, this achieves a disk-space reduction for the search index from 574K to 500K.

Describe alternatives you've considered N/A

Additional context The Sphinx codebase does sometimes load from the existing searchindex.js file -- in order to support adjustments to the search index during incremental builds. So we do need to retain round-trip ability here; but that's OK, whitespace between separators in JSON should be insignificant (parser-wise).

jayaddison commented 1 month ago

I suppose it's possible here that on-the-wire compression may eliminate the on-disk space saving gains. Sigh. I'll confirm that soon before opening any pull request.

jayaddison commented 1 month ago

Search index size reduction for searchindex.js from the Sphinx self-built documentation, ordered by net difference, decreasing:

Compression Method Baseline size (bytes) Minimal-separator size (bytes) Reduction
none 587136 511787 13%
zstd 1.5.6 (level 3) 142378 133931 6%
gzip 1.12 (level 6) 132225 126116 5%
gzip 1.12 (level 9) 128730 125003 3%
brotli 1.1.0 (level 11) 103727 101480 2%
zstd 1.5.6 (level 19) 108235 106181 2%

Edit: 1: clarify that the reduction is focused on the search index file. Edit: 2: add brotli stats Edit: 3: re-sort table by net size difference

jayaddison commented 1 month ago

It almost seems like minimal-whitespace separators may have been accidentally removed during 0830a04bbf83f3da75c3ab95cee27bba0e721c46 -- they were in use before that (c4b660c5e07236a0f923f8a93a0779a0013f7099).

kartben commented 1 month ago

Search index size reduction for searchindex.js from the Sphinx self-built documentation, ordered by net difference, decreasing:

Compression Method Baseline size (bytes) Minimal-separator size (bytes) Reduction
none 587136 511787 13%
zstd 1.5.6 (level 3) 142378 133931 6%
gzip 1.12 (level 6) 132225 126116 5%
gzip 1.12 (level 9) 128730 125003 3%
zstd 1.5.6 (level 19) 108235 106181 2%

Would be curious about brotli numbers, if you ever find a minute to give it a try. Anyway, this is great!

jayaddison commented 1 month ago

Would be curious about brotli numbers, if you ever find a minute to give it a try. Anyway, this is great!

Sure thing; they are:

Compression Method Baseline size (bytes) Minimal-separator size (bytes) Reduction
brotli 1.1.0 (level 11) 103727 101480 2%

(I'll go back and add another amendment to the previous message with that, too)

jayaddison commented 3 weeks ago

Implemented in #13062.