The original goal was to improve the ngram search deployment. After adding new facts in 2020, the number of search results for "election" in previous years decreased, which doesn't make any sense. Turns out that lucene was setup to return at most 100 results, and if there were more than that it just picked a random 100. Luckily, election has 200+ results, so it triggered this bug.
Basically, it turns out that our precached searches are a nice way to test our whole search infrastructure, so I took the opportunity to also fix some long-outstanding search improvements:
we now allow up to 1,000 results
if there are more than that, we return 0 (#423 is open issue to show error for this case)
a search for "election" will now return both "elections" and "election's"
The original goal was to improve the ngram search deployment. After adding new facts in 2020, the number of search results for "election" in previous years decreased, which doesn't make any sense. Turns out that lucene was setup to return at most 100 results, and if there were more than that it just picked a random 100. Luckily, election has 200+ results, so it triggered this bug.
Basically, it turns out that our precached searches are a nice way to test our whole search infrastructure, so I took the opportunity to also fix some long-outstanding search improvements: