rust-lang / mdBook

Create book from markdown files. Like Gitbook but implemented in Rust
https://rust-lang.github.io/mdBook/
Mozilla Public License 2.0
18.47k stars 1.65k forks source link

Warn when generated search index is huge #704

Closed killercup closed 6 years ago

killercup commented 6 years ago

cf. https://github.com/rust-lang/rfcs/pull/2469

TimNN commented 6 years ago

Some more thoughts from when I was investigating this:

mattico commented 6 years ago

Few quick thoughts:

TimNN commented 6 years ago
  • It seems we don't have a config option to disable search...

That can be worked-around by setting copy-js = false, although I don't know if that was the intended usage of this function.

  • One reason the index is so big is because it includes a Document Store which contains the full processed text of each document in the index, used for the "teasers" in search results. This is optional, but I notice we don't provide a config option to disable it.

I don't think that is the main reason, removing ~all body content [0] reduce the RFC index from 38MB to 35MB.

  • Modifying the index format is somewhat tricky since we're using a JS library (elasticlunr.js) which expects a specific format. We could work around this, of course.

I'm not familiar with elasticlunr.js, however I would assume that it supports user-defined keys for the documents it indexes? In that case just replacing the current rfc-0000-foo.html#heading keys with integers could prove a huge win.

There are currently ~4500 unique URLs referenced in the index, the most common one occurs almost 1000 times, the least common ones occur 6 times. Not counting quotation marks, just these URLs make up about 21MB.

[0] perl -pi -e 's/"body":".*?","breadcrumbs":/"body":"...","breadcrumbs":/g' index.js

mattico commented 6 years ago

Here's the effect of the changes in #707 on page load performance for the rfcs book. The page load stats were recorded using the Firefox dev tools in "Good 3G" throttling and simple-http-server with GZip.

v0.1.7 6ca68e4d
searchindex.js{,on} 38,659KB 18,697KB
searchindex.js{,on}.gz 6,726KB 2,915KB
Page Load (empty cache) 7.44s 633ms
Page Load (full cache) 346ms 233ms
Search Index Load (empty cache) N/A* 849ms
Search Index Load (full cache) N/A* 388ms

* searchindex.js was loaded before a few page resources, so it's included in the Page Load numbers

Don't know how much parse time would be spent on the JSON on a phone, but I think the above numbers are acceptable if imperfect.

TimNN commented 6 years ago

While iOS still crashes, this looks much better already, thanks a lot! The performance analysis tools offered by Chrome are also much happier :D