Closed SamWheating closed 1 year ago
Thanks for taking a crack at this.
- Is the additional complexity of this change worth the ~2% performance improvement?
I'm not convinced that it is. It's a 2% improvement on a benchmark that exercises only the optimized operation, which itself accounts for (I guess) at most a few percent of total running time in a realistic workload. (It's unfortunate that we don't have realistic large-scale benchmarks in this repo.)
Can't insertGrow be expressed as a boolean parameter to the existing insert method that causes it to skip the Equal check (assuming it to be false). That may be a reasonable compromise. The existing hash could also be passed in to avoid recomputing it; this might be best expressed by factoring the common tails of insert and insertGrow (essentially from "retry:" onwards).
Can't insertGrow be expressed as a boolean parameter to the existing insert method that causes it to skip the Equal check (assuming it to be false). That may be a reasonable compromise.
Yeah this is a good suggestion, I'll see if I can refactor this to reduce the code duplication.
Hey @adonovan,
I spent a bit of time refactoring this this morning, and any changes I made pretty much cancelled out the marginal performance gains (I guess due to the overhead of the if grow {}
checks, additional control flow, etc). Trying to bundle both code-paths into a single method also introduces a similar level of complexity to the codebase as the changes proposed in this PR.
With that in mind, I think that we should either continue with this approach, understanding that insert()
and growInsert()
are significantly different methods, or just close this ticket as the opportunity because the opportunity for optimization isn't all that significant - either approach is totally fine by me.
Thoughts?
I spent a bit of time refactoring this this morning, and any changes I made pretty much cancelled out the marginal performance gains
Thanks for investigating this, Sam. I think the gains are probably too slim to justify the additional logic, but rather than put this PR to waste, let's record what you've found by replacing the old TODO comment with this:
// Double the number of buckets and rehash.
//
// Even though this makes reentrant calls to ht.insert,
// calls Equals unnecessarily (since there can't be duplicate keys),
// and recomputes the hash unnecessarily, the gains from
// avoiding these steps were found to be too small to justify
// the extra logic: -2% on hashtable benchmark.
You could remove the "pool" comment too, since it's rather fiddly to implement. (sync.Pool is good only for uniform objects by these bucket arrays vary in size.)
let's record what you've found by replacing the old TODO comment with this.
Done ✅
Are there any other parts of this codebase which you think might be good candidates for performance improvements? This was an interesting exercise and I've got some time set aside this week for similar work.
Are there any other parts of this codebase which you think might be good candidates for performance improvements? This was an interesting exercise and I've got some time set aside this week for similar work.
Not really. When I developed the package I went over it quite carefully trying to squeeze out the most efficiency within the current design, and then I profiled and optimized a large and performance-sensitive application (similar to Bazel) on realistic workloads (which unfortunately can't easily be turned into benchmarks---and in any case that particular project was cancelled). So I think most of the easy wins that actually matter have been claimed already. I'm reluctant to suggest that we optimize anything for its own sake, in the absence of a realistic benchmark that some user actually cares about, because it's a waste of effort that could be spent elsewhere, and it makes the code more complicated than it needs to be.
But I appreciate your offer, and this PR. Replacing a TODO with an informed statement of fact is actual progress.
Thanks for the background - I wasn't sure how much profiling work had already been done and I totally agree that there's no use in optimization for the sake of optimizing.
Update: Decided that this is not a worthwhile optimization due to modest performance improvements, updating the TODO and otherwise not making any other changes. Leaving my original notes for reference.
Addressing this TODO I found in the hashtable implementation because it seems like a clear performance win and an interesting investigation.
tl;dr instead of using the
hashtable.insert
method to copy elements into the new bucket, we can write a specializedhashtable.growInsert
method which bypasses some of the deduplication and checks which only apply when inserting into a pre-existing hashtable.Optimizations include:
grow()
)Let me know what you think or if you have any other suggested optimizations.
Benchmark results
Overall these results are less significant than I expected, but I suspect that the performance gap could be widend in a purpose-built hashtable insertion benchmark (for example, keys with more compute-intensive hash functions, or benchmarks involving higher write volumes)
Before:
After:
Open Questions