qlibs / mph

C++20 [Minimal] Static Perfect Hash library
176 stars 9 forks source link

Is it possible to support a 16-byte string as a key? #2

Closed liziyi75 closed 6 months ago

liziyi75 commented 6 months ago

Many times, 8 bytes are too short for some occasions. Thank you.

kris-jusiak commented 6 months ago

That's defo possible to support and require very little change, however, bmi2 - pext instruction supports u32 and u64 only ATM , so max 8 bytes, with 2 bytes per character, that would be 4 letters fully hardware accelerated with a single pext instruction and single comparison. That's not a lot of characters so it depends on the use how valuable that would be? There is a case to extend mph for multiple lookups after splitting keys into buckets but that's a bit bigger feature and the lookup will be a bit slower due to additional lookup but still much faster than any other common solution so it might be worth it :thinking:

liziyi75 commented 6 months ago

I see. If we can expand the length of the "str key", it will be very helpful in mapping program static objects such as from names to enum values, struct members, function pointers, static configuration values, and so on. In addition, it is expected to support using enum value as key

liziyi75 commented 6 months ago

The following may be an option, the hash seed can be changed to avoid duplicate keys https://godbolt.org/z/zGPdocnG5

kris-jusiak commented 6 months ago

Thanks for sharing (BTW I've just realized the question was about 16 byte string and not 16 bits chars, which I somehow confused, so sorry if my initial answer could have been a bit out of context). Hashing the input is an interesting idea. I think that would also be required on the back-end when looking for mask as there is potential of collisions. I didn't dig into the produced assembly, the fastest likely would be to bitcast 16 bytes assuming 16 bytes input but there are some ways around it. Just to make it clear, I'm all up for extending 8 bytes limitation as long as we can make it efficient and not slowing down the <= 8 bytes case. I need to dig in a bit more to understand the trade-offs but it's defo a neat approach. Thanks again for sharing the idea/implementation.

liziyi75 commented 6 months ago

Because the value range of visible characters is 32-126, it is ensured that the first bit of a byte is 0. Therefore, by performing a bitwise OR operation with 0x8080808080808080 on the last 8 bytes of >8 bytes and <=16 bytes, and performing a bitwise OR operation with 0x80808080808080 on the last 8 bytes of >16 bytes and <=24 bytes, and so on, it can ensure that the key value is unique. https://godbolt.org/z/zEhbPrbfv

kris-jusiak commented 6 months ago

Thanks again. I think that's good and really clever so let's make it happen! Converting string to integral during the construction instead of using fixed_string is actually better, IMHO and it will allow for extension. The case for >8 might be done without overhead if the size is known at compile-time so no issues here. If the size is not known aka string_view we have a few options to either pay the cmp or inform that it's less or greater than 8. either way can be done so let's do it! I'll change the fixed_string as the first step and we can go from there. Thanks again.

liziyi75 commented 6 months ago

fixup: https://godbolt.org/z/94s3s79oT

kris-jusiak commented 6 months ago

I'm planning to work on it likely tomorrow or over the weekend but I wanna make sure you will get all the credit for the idea. So could you then maybe create a MR with additional function which won't change the existing ones for example to64 or similar (like in https://godbolt.org/z/94s3s79oT), name doesn't matter with a static assert test then we can add example and more testing to https://godbolt.org/z/hsjzo4x8v and then I'll follow-up with some deeper integration to the hash call and extend benchmarks. Then we can release the new version. Does that sound okay? I'm happy following any other path too if something else work better for you? Thanks.

liziyi75 commented 6 months ago

Your willingness to extend this functionality is the best reward for this idea, thank you!

kris-jusiak commented 6 months ago

Sure, that's refreshing, anyway credit will still go to you. So, I made the first part of refactoring, there is no more mph::fixed_string and we just have to and only operate on integral types now, kinda needed to be done for faster compilation times for a lot strings, also added policy split which can handle thousands of tickers with a single additional lookup and can compile that in around 10 seconds for 5000 tickers (still optimizing it). Next step is to add the changes to support 8+ string names. Not sure whether I'll be able to pull it off today but defo this week.

kris-jusiak commented 6 months ago

Support for up to 16 keys has been merged. Thanks for again for all your suggestions and code examples. As it stand __int128 or u32 [[vector_size(16)) has been chosen for 8-16 characters due to performance and ability to extend up to 64 characters when avx512 is available (which hasn't been done yet).

The following is working now - https://godbolt.org/z/o5zc3oYWv.

There is some overhead of using int128 on the storage size but only if there are keys longer than 8. with sse2/avx simd usually simd registers can be used.

Thanks again!