Closed aldanor closed 1 year ago
Patch coverage: 58.82%
and project coverage change: -0.43%
:warning:
Comparison is base (
ba6a882
) 83.34% compared to head (9291210
) 82.92%. Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Clippy errors: can fix in a later PR along with some deps updates (unrelated to this PR, some deprecated chrono stuff).
Miri errors: not sure can prove it to miri here that it's not UB.
Great PR. This could improve grouping performance by low cardinality column data in other dbms.
@sundy-li you were too fast. I also wanted to review this one. Especially given that MIRI isn't happy. :/
@aldanor I think we can achieve the same with a Hashmap<K, usize>
where the usize
gets the value in the MutableArray
.
IAC This will save a lot of (undsafe) code, and make the values of the hashmap smaller as you don't have two pointers (the ref and the index), but only a single index.
We can use that something like this:
self.inner_map
.raw_entry_mut()
.from_hash(hash, |hash_map_key| {
let index = hash_map_key.index;
self.get_value(index) == insert_value
}
})
I think we should go in this direction.
@ritchie46 Sure, sounds like a plan. Tbh I didn't even think of the raw hashmap API since I got too used to the fact that it's nightly-only (but not for hashbrown!).
All the "Indexable" stuff is unfortunately still needed since we have no clean way of connecting "stuff we're pushing" to "stuff we're storing" elsewise.
I'll take a look at the raw hashmap approach now and report back.
(I'll try to look into it asap so maybe don't merge the revert PR first, so we don't end up with the same code flipped back and forth 3 times...)
@ritchie46 I think you meant HashMap<usize, K>
(as opposed to HashMap<K, usize>
) 🙂
But anyways, it does seem to work, I'll push PR shortly.
@ritchie46 @jorgecarleitao Here's one way to fix the incorrect current behaviour of
MutableDictionaryArray
: only rely on values actually stored in the values array and don't rely on hash-hash maps (due to potential hash collisions).This is almost a rewrite of the whole thing, outer API aside, and gets pretty evil (self-referential) at the lower level, but I believe it's pretty sound.
There's probably bits and pieces that may need to be cleaned, types and methods to be renamed, potentially some docstrings added etc (comments welcome), but I figured I'd push the current version as soon as it's working so as to figure whether something like this would be acceptable.
Bench-wise, current main (10k dict insertions):
This branch:
Fixes #1485 Fixes #1554