Open jayaddison opened 2 weeks ago
Would this work as an alternative?
// in Search.setIndex()
const frozen_index = JSON.parse(index, (key, value) => {
typeof value === "object" && value !== null
? Object.freeze(value)
: value;
});
If we change Search.setIndex()
to taking a JSON string, it seems we could use the 'reviver' function to freeze arrays and objects?
A
@AA-Turner although I'd recommend some performance testing, that could be useul for #13098, yep.
In fact, no, my mistake: JSON.parse
may in fact be necessary to solve this robustly. The test appears only to be passing due to the fact that __proto__
is found in the objects
part of the searchindex.js
file. The terms
entry for __proto__
is not being handled correctly.
I've pushed the change to use JSON.parse
and reviver
-- a note of caution, though: when I was testing that against a JSON-ized version of the Python documentation searchindex.js
file yesterday, I was encountering ~200ms duration within the setIndex
function in my browser.
Personally, I'd be inclined to merge the previous version of this and address that (if at all) in a followup. Trying to "freeze" the index here feels like scope creep to me and it would be nice to get this fix in.
That's reasonable, yep, and I've no strong opinions on whether we include or omit the Object.freeze
aspect. However, without 462c8595937f2034f86322c8650e3f76f81de905, I don't believe this branch would genuinely fix the reported bug.
Without the reviver function, I recall the setIndex
+ JSON.parse
performance overhead being in the order of ~60ms on my local machine. I'll confirm those stats in some more detail within the next day or so. I get the feeling that some of this will be coupled to the browser+implementation (Firefox) details, so I may try to compare to Chrom[e|ium] also.
In fact, no, my mistake:
JSON.parse
may in fact be necessary to solve this robustly. The test appears only to be passing due to the fact that__proto__
is found in theobjects
part of thesearchindex.js
file. Theterms
entry for__proto__
is not being handled correctly.
Sorry, I missed this part. Do you know why this is the case? The solution looked correct to me?
In fact, no, my mistake:
JSON.parse
may in fact be necessary to solve this robustly. The test appears only to be passing due to the fact that__proto__
is found in theobjects
part of thesearchindex.js
file. Theterms
entry for__proto__
is not being handled correctly.Sorry, I missed this part. Do you know why this is the case? The solution looked correct to me?
The test case in 0fff170a5193741f53b56c0304e1864b754fa81a was inaccurate -- it demonstrated that an object Object.__proto__
could be found based on a user query for __proto__
.
However, there was still a bug: a document text term __proto__
-- represented by the "terms":{"__proto__":0
fragment of the searchindex.js
-- was not searchable as intended.
Without the reviver function, I recall the
setIndex
+JSON.parse
performance overhead being in the order of ~60ms on my local machine. I'll confirm those stats in some more detail within the next day or so. I get the feeling that some of this will be coupled to the browser+implementation (Firefox) details, so I may try to compare to Chrom[e|ium] also.
I haven't gotten around to this comprehensive performance testing yet, but want to share some notes:
Initially the native Map
approach (up to fae81a0a92e81618c8fc70b2db1f3f93bc0dccaa) seems similar in runtime performance to the existing/baseline approach (object literals), based on my testing using Sphinx's self-built documentation.
The page-load searchindex-evaluation time on my machine for a query for phin
(~700 results) is ~30ms in Firefox, compared to a ~20ms baseline. Those are both relatively small compared to a ~0.5s duration for display/rendering of the search results, but arguably that is a pessimal query (a more-precise match would reduce the display duration).
There is a small amount of index size bloat -- not so much from the new Map(
and )
text overhead, but more from the additional Array
literals (e.g. square brackets) used to format their entries instead of object literals. This is approximately 5% overhead on the test dataset; 541857 bytes compared to 516691 baseline.
There seem to be multiple factors to balance: Python complexity, JavaScript complexity, code safety (both datastructure immutability, and also prototype access), runtime performance and index filesize.
I'm not sure how much to explore and how to present my thoughts/findings. Generally my feeling is that the Map
approach here (again, up to fae81a0a92e81618c8fc70b2db1f3f93bc0dccaa) seems good on most criteria except for Python complexity (particularly the unit tests, but also the application serialization code is more complex than it was) and immutability (that it does not address, in my opinion). The tc39
Records-and-Tuples proposal, I think, would be near-optimal on all factors -- but we could be years away from that reaching widespread adoption.
There is a small amount of index size bloat -- not so much from the new Map( and ) text overhead, but more from the additional Array literals (e.g. square brackets) used to format their entries instead of object literals. This is approximately 5% overhead on the test dataset; 541857 bytes compared to 516691 baseline.
My guess is that most of this difference (which is relatively small) would disappear with compression.
I should have thought to evaluate that at the same time - from a quick check with brotli
- indeed, the difference becomes fairly insigificant (102833 vs 102137, less than 1%). Perhaps it's helpful for compression that the characterset becomes slightly more predictable.
Feature or Bugfix
Purpose
basic
theme.Detail
Ensure that the user input query term is found as a named property of the relevant index objects before accessing them, usingObject.hasOwnProperty
.Map
instances instead ofObject
literals.Relates
May resolve # 13098 (conditional on inclusion ofJSON.parse
reviver /Object.freeze
logic)Edit: update description to reflect JavaScript Map-based approach.