Closed jayaddison closed 3 weeks ago
I suppose it's possible here that on-the-wire compression may eliminate the on-disk space saving gains. Sigh. I'll confirm that soon before opening any pull request.
Search index size reduction for searchindex.js
from the Sphinx self-built documentation, ordered by net difference, decreasing:
Compression Method | Baseline size (bytes) | Minimal-separator size (bytes) | Reduction |
---|---|---|---|
none | 587136 | 511787 | 13% |
zstd 1.5.6 (level 3) | 142378 | 133931 | 6% |
gzip 1.12 (level 6) | 132225 | 126116 | 5% |
gzip 1.12 (level 9) | 128730 | 125003 | 3% |
brotli 1.1.0 (level 11) | 103727 | 101480 | 2% |
zstd 1.5.6 (level 19) | 108235 | 106181 | 2% |
Edit: 1: clarify that the reduction is focused on the search index file. Edit: 2: add brotli stats Edit: 3: re-sort table by net size difference
It almost seems like minimal-whitespace separators may have been accidentally removed during 0830a04bbf83f3da75c3ab95cee27bba0e721c46 -- they were in use before that (c4b660c5e07236a0f923f8a93a0779a0013f7099).
Search index size reduction for
searchindex.js
from the Sphinx self-built documentation, ordered by net difference, decreasing:
Compression Method Baseline size (bytes) Minimal-separator size (bytes) Reduction none 587136 511787 13% zstd 1.5.6 (level 3) 142378 133931 6% gzip 1.12 (level 6) 132225 126116 5% gzip 1.12 (level 9) 128730 125003 3% zstd 1.5.6 (level 19) 108235 106181 2%
Would be curious about brotli numbers, if you ever find a minute to give it a try. Anyway, this is great!
Would be curious about brotli numbers, if you ever find a minute to give it a try. Anyway, this is great!
Sure thing; they are:
Compression Method | Baseline size (bytes) | Minimal-separator size (bytes) | Reduction |
---|---|---|---|
brotli 1.1.0 (level 11) | 103727 | 101480 | 2% |
(I'll go back and add another amendment to the previous message with that, too)
Implemented in #13062.
Is your feature request related to a problem? Please describe. While trying to find out about something mostly unrelated in the Python JSON encoding documentation, I realized: we don't use the minimal whitespace representation when dumping our search index to JSON.
Describe the solution you'd like Configure the documented minimal-whitespace separators when serializing the JSON searchindex.
For the self-built Sphinx documentation, this achieves a disk-space reduction for the search index from 574K to 500K.
Describe alternatives you've considered N/A
Additional context The Sphinx codebase does sometimes load from the existing
searchindex.js
file -- in order to support adjustments to the search index during incremental builds. So we do need to retain round-trip ability here; but that's OK, whitespace between separators in JSON should be insignificant (parser-wise).