Open alanjds opened 6 years ago
Comment by trotterdylan Monday Feb 20, 2017 at 18:40 GMT
This is really cool. Thanks for preparing this PR. It's a bit involved so I may take a few days to get through it.
At a high level, I wanted to point out that the immutability of dictEntry objects is an important invariant of the current design. I can't remember all the details off hand but I think it ultimately comes down to not being able to update both the key and value atomically. Can you comment on why this is no longer important in this design?
Comment by nairb774 Tuesday Feb 21, 2017 at 01:36 GMT
Sorry about the size. It wouldn't surprise me to take some time - the nice part is that this is a performance change, not new functionality so time is on our side.
As an overview, the implementation is a blending of the table layout of the Python dict implementation, and a concurrency implementation borrowed from https://github.com/golang/sync/commit/54b13b0. In that light, writes to the table are serialized by a mutex, while reads are performed from a immutable table (assuming the map has transitioned to fast reads). This means that for heavy read-only dicts (think globals) most reads only have the cost of a single atomic read. For heavy read/write dicts, the cost mostly the overhead of locking the recursiveMutex
.
If anything I write here would make better sense in the code or commit comment, let me know. I'd also be happy to add/modify existing benchmarks if you think that would be good to make sure this is as good an improvement as I think it is. The main goal for me has been to try to get the CallSimple benchmark closer to the speed that CPython is able to execute it (12ms).
@nairb774 I pulled your PR, but had to make some changes to fit the updated overwrite
option that you added later on another PR. I am not sure that it is still correct now. Can you please take a look on #103 ?
Thanks.
google/grumpy#259 opened on Feb 19, 2017 by @nairb774
This improves the speed of the previous dict implementation through a reduction in the number of atomic loads in the read path (max 1 when the dict is read-only - think globals) as well as the number of allocations needed in the write path.
Overall the performance is improved by about 30%.
Some of the major changes are as follows:
The internal table layout was changed from []*dictEntry to []dictEntry reducing a memory indirection as well as hopefully improving the speed of slot probing in insertAbsentEntry as well as lookupEntry.
Many iteration operations which might have needed to grab a relatively expensive lock previously can now do so without locking if the dict is in the read-only mode.
The sizeof(Dict) increased some as a few variables (used and fill) were moved from the dictTable to Dict itself. The addition of a
write
andmisses
values to the Dict makes the overall memory usage of Dict generally larger. This is offset by the type change of dictTable and the reduction of additional pointers there. An empty dict weighs in at 304 bytes compared to 176 previously. At 4 elements, both the old and new implementation use 304 bytes of memory. From that point on, the new implementation actually uses less memory.Benchmark data (unchanged/statistically insignificant results removed):