troydhanson / uthash

C macros for hash tables and more
Other
4.18k stars 926 forks source link

[feature request] case-less finding #149

Closed silvioprog closed 6 years ago

silvioprog commented 6 years ago

Hello.

Consider the following example:

#include <stdio.h>
#include "uthash.h"

struct pair {
    char *name, *val;
    UT_hash_handle hh;
};

struct pair *map = NULL;

void strmap_add(const char *name, const char *val) {
    struct pair *pair = malloc(sizeof(struct pair));
    pair->name = strdup(name);
    pair->val = strdup(val);
    HASH_ADD_STR(map, name, pair);
}

void strmap_cleanup() {
    struct pair *pair, *tmp;

    HASH_ITER(hh, map, pair, tmp) {
        HASH_DEL(map, pair);
        free(pair->name);
        free(pair->val);
        free(pair);
    }
}

int main() {
    struct pair *pair;

    strmap_add("Content-Type", "text/html");
    strmap_add("Accept-Encoding", "deflate,gzip");

    HASH_FIND_STR(map, "accept-encoding", pair);
    if (pair)
        printf("%s: %s\n", pair->name, pair->val);
    else
        printf("None");

    strmap_cleanup();
    return 0;
}

The output will be None, so it would be nice if uthash could allow to find items by case-less key, something like HASH_FIND_STR_CL() or HASH_FIND_STR_CASE_LESS().

Thank you!

Quuxplusone commented 6 years ago

You can use HASH_FIND with strcasecmp (or stricmp if that's what your platform calls it).

silvioprog commented 6 years ago

I took a look at uthash sources, but I'm not sure how to do that. I suspect it can be done by replacing uthash_memcmp macro to use some strc*mp() instead of memcmp() ... šŸ¤”

silvioprog commented 6 years ago

I tried by replacing uthash_memcmp, but, no success. šŸ˜•

...
#include <string.h>

int my_strncasecmp(const char *s1, const char *s2, size_t n) {
    printf("Aww yeah! ...\n");
    return strncasecmp(s1, s2, n);
}

#define uthash_memcmp my_strncasecmp

#include "uthash.h"

struct pair {
...

it seems the my_strncasecmp() is never called.

Quuxplusone commented 6 years ago

Ah, yes. You can't use "the case-insensitive form of the key" as a hash key, because then you'd need to design a case-insensitive hash function (so that "foo" and "FOO" not only compared equal, but also hashed into the same bucket in the first place). None of the hash functions provided by uthash.h have that property.

And then yes, you'd either have to redefine uthash_memcmp, or else transform your keys prior to inserting them in the hashtable. That is,

struct Element {
    char *original_key;  // "HellO"
    char *lowercase_key;    // "hello"
};

and then use the existing HASH_FIND_STR with the key field lowercase_key instead of with original_key.

You could also probably do it by redefining uthash_memcmp, but that seems like a comparatively bad idea.

I'm not sure there's anything that uthash.h can do to make your life easier here.

silvioprog commented 6 years ago

Thank you very much for answering! I've used this same design b4r_hs.h#L34. My key is the lowercase_key as you showed. :-) I'm rewriting my library and trying to find the best way to use all the dependencies (uthash, utstring etc.), if there was this feature I would use it.

Anyway, this is only a detail, because uthash saves my life everyday. šŸ˜…

silvioprog commented 6 years ago

Hey dudes, I'm going to close this issue. I fixed the problem using an "extra" field as @Quuxplusone suggested (thanks again!). It is awesome and avoids to grows uthash unnecessary. :smiley: