Open FranciscoThiesen opened 2 months ago
Hi @FranciscoThiesen ,
Yeah, there is a limit of string lengths and different optimizations are used based on the max-length string found in the set. So there is a difference in generated assembly for <= 4 maxlen, <=8 maxlen where the mov and pext can be optimized. For maxlen >8 the generation is more likely to fail as well and that will be dependent on the input set uniqueness when trying to find the mask. There is also storage implications based on the size and the cache utilization will differ as pairs of {maxlen, type required to store ids} is used. So for example, for strings <=4 it's just u32 for <=8 it's u64, for <=16 u128, for u32 m256 (avx2), for <=64 m512 (avx512). There are options to do second lookup table to limit the storage etc. as well as having a fallback. Lookup and get are SFINE friendly so in case the solution can be used that can be checked with requires
or similar techniques. lookup and find themselfs have different backups based on the input strings. So fro example, lookup will try to do magic_lut (which stores the whole lookup table in a single register) but that's rarely possible so then it's going for pext solution with masking (which is likely to find a solution). There is also a way to simply write backup policies based on specific use case - example here - https://github.com/qlibs/mph/blob/main/mph#L1152.
There is no slowdown associated by not providing ids, they will be automatically added and the same optimizations will be triggered.
constexpr auto dict = std::array<std::string_view, 3>{
"make"sv,
"model"sv,
"year"sv
};
is equivalent to writing
constexpr auto dict = std::array<std::string_view, 3>{
pair{"make"sv, 0}
pair{"model"sv, 1}
pair{"year"sv, 2}
};
Thanks for the fast response, @krzysztof-jusiak
Is there an example of how can I provide a fallback mechanism to handle with strings with size > 64?
I saw on the other open issue that taking a segment of the string is a possibility, but I can't guarantee the uniqueness of a certain segment of size <= 64 for the strings on my set.
Hi, @krzysztof-jusiak
This is a really cool library! When reading the code I was left with 2 questions:
int main() { static_assert(mph::find("make") == true);
static_assert(mph::find("model") == true);
static_assert(mph::find("year") == true);
static_assert(mph::find("unknown") == false);
}