modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo/manual/
Other
22.91k stars 2.58k forks source link

[Feature Request] Add benchmark for `Dict` and explore swiss table algorithm with SIMD probing #3254

Open jon-chuang opened 2 months ago

jon-chuang commented 2 months ago

Review Mojo's priorities

What is your request?

While optimizing for small dictionaries is fine, I also think it might be nice to support cases which may involve large dicts.

For good perf on these, swiss table is a decent choice, implemented for Rust's HashMap and abseil's flat_hash_map. It utilizes SIMD to generate a bitmask from 7bit key prefix metadata array for further probing of 64-bit key exact match when doing a linear scan of a bucket.

This reduces the probing cost by ~128, with the metadata probe costing itself ~1% due to SIMD, in addition to being hotter in cpu cache due to read locality and by virtue of being smaller (hence less cache churn)

What is your motivation for this change?

perf

Any other details?

Currently, no such metadata is maintained and an expensive linear probe is carried out: https://github.com/modularml/mojo/blob/8bd1dbdf26c70c634768bfd4c014537f6fdb0fb2/stdlib/src/collections/dict.mojo#L907

Let's see some benchmark numbers first

Other Dict Perf Work: https://github.com/modularml/mojo/pull/3128 https://github.com/modularml/mojo/pull/3133

martinvuyk commented 2 months ago

Hi, interesting algorithm. But wouldn't it be simpler to just keep a B-tree index (of the first/last 7 hash bits) if the data becomes really big (often used in DBs) ? Though this would introduce much more memory overhead, and inserts would be kinda expensive.

Maybe we could build it as a separate experimental type first and try it out.

@mzaks @bethebunny you guys have been working with Dict a lot (I think), thoughts?

jon-chuang commented 2 months ago

We can definitely switch to this scheme only for bigger dicts - this would depend on perf numbers.

To my understanding, only the remaining 57 bits of the 64 bit hash are stored collocated with the values, so there is no memory penalty.

As for metadata btree index, there is no reason to do this. One simply hashes to metadata offset in the backing array the same way one hashes to a hash slot offset.

One then does a linear probe of the metadata array with SIMD the same way one would have probed the full key array.

Here, we allocate backing arrays of fixed pow of 1.3 (or 2) reservation sizes (the power chosen is again another design choice).

JoeLoser commented 2 months ago

Benchmarking dict and understanding the tradeoffs of performance, memory usage, and various internal metadata-tracking algorithms would all be good things to do. Feel free to work on this! We're not actively working in this area at the moment.

mzaks commented 2 months ago

Current implementation of the dict in Mojo standard library is based on Pythons implementation. It implements a pseudo random probing algorithm and stores key value pairs together. This doesn't allow SIMD comparison, but I think this is still not the main bottleneck, main bottleneck is the hash function. Hence I wrote a proposal https://github.com/modularml/mojo/blob/main/proposals/improved-hash-module.md and there are a few PRs to address it, which are currently blocked by some compiler issues see https://github.com/modularml/mojo/pull/2891.

In the meantime we can implement Swiss table like dict and you can also have a look at https://github.com/mzaks/compact-dict I did a short presentation about it here https://youtu.be/3FKSlhZNdL0?si=fEshjEgpNvVUAdBu (I am the second presenter at 13:20) I would be happy if someone wants to take the time and add a SIMD based probing to it. Currently it works with simple linear probing.

soraros commented 2 months ago

There was already #1762. But since the discussions happens here, we can probably close that one.

bethebunny commented 2 months ago

I'm actively trying to shed responsibility for Dict to get more time to focus on other things 😅 We'd love to build out some more formal governance for the community. In the meantime there's lots of good thoughts here, I think y'all have some power to pave the way a bit with driving a concrete proposal. I'd be more than happy to jettison anything that currently exists for something more well motivated!