rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.69k stars 361 forks source link

`HtPx` hash tables cannot be used with arbitrary pointers as keys. #4261

Closed Rot127 closed 4 months ago

Rot127 commented 8 months ago

Work environment

Questions Answers
OS/arch/bits (mandatory) Debian x86_64
File format of the file you reverse (mandatory) -
Architecture/bits of the file (mandatory) -
rizin -v full output, not truncated (mandatory) rizin 0.7.0 @ linux-x86-64

commit: dc753bba5822378bc1b871638ed232cfdb3815

Expected behavior

Every unique pointer is a unique key.

Actual behavior

Pointers never added are found in the hastable.

Steps to reproduce the behavior

See: https://github.com/rizinorg/rizin/pull/4260

Additional Logs, screenshots, source code, configuration dump, ...

This breaks completely SetP. But it isn't used anywhere anyways It is used in agraph.c.

Rot127 commented 8 months ago

As an indicator where problems can occur: The list with usages of zero initialized pointer as keys hash tables:

> grep -rnIE "ht_p[up]_new0\(" librz/ 

librz/util/set.c:9: return ht_pp_new0();
librz/util/lib.c:33:    lib->opened_dirs = ht_pu_new0();
librz/analysis/analysis.c:95:   analysis->ht_name_fun = ht_pp_new0();
librz/analysis/class.c:1287:    HtPP /*<char *name, RzGraphNode *node>*/ *hashmap = ht_pp_new0();
librz/type/type.c:1130: HtPP *used_types = ht_pp_new0(); // use a hash table to keep track of unfolded types
librz/type/parser/c_cpp_parser.c:36:        state->types = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:41:        state->callables = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:46:    state->forward = ht_pp_new0();
librz/type/serialize_functions.c:136:   HtPP *type_str_cache = ht_pp_new0(); // cache from a known C type extr to its RzType representation for skipping the parser if possible
librz/core/cmd/cmd_analysis.c:4270: HtPU *ht = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4297: HtPU *keys_set = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4306:     HtPU *db = ht_pu_new0();
librz/core/cmd/cmd_api.c:228:   cmd->ht_cmds = ht_pp_new0();
librz/core/cmd/cmd_info.c:87:       HtPP *files = ht_pp_new0();
librz/core/cmd/cmd_eval.c:142:  HtPU *themes = ht_pu_new0();
librz/config/config.c:495:  cfg->ht = ht_pp_new0();
librz/config/serialize_config.c:52:     ctx.exclude = ht_pp_new0();
librz/debug/trace.c:21: t->ht = ht_pp_new0();
librz/debug/trace.c:304:    t->ht = ht_pp_new0();
librz/reg/profile.c:356:        reg->regset[t].ht_regs = ht_pp_new0();
librz/include/rz_util/rz_serialize.h:49:    return ht_pp_new0();
librz/bin/format/mach0/dyldcache.c:567: HtPU *path_to_idx = ht_pu_new0();
librz/bin/format/mach0/mach0.c:2524:    HtPP *hash = ht_pp_new0();
librz/bin/format/elf/elf_symbols.c:404: HtPP *name_set = ht_pp_new0();
librz/bin/bobj_process_class.c:88:  o->name_to_class_object = ht_pp_new0();
librz/bin/bobj_process_class.c:89:  o->glue_to_class_method = ht_pp_new0();
librz/bin/bobj_process_class.c:90:  o->glue_to_class_field = ht_pp_new0();
librz/bin/bobj_process_section.c:36:    HtPP *filter_db = bin->filter ? ht_pp_new0() : NULL;
librz/bin/bobj_process_symbol.c:98: o->import_name_symbols = ht_pp_new0();
Rot127 commented 8 months ago

HtPx is supposed to be used with strings as keys. As @wargio suggested, the proper fix would be in this case to introduce HtSx (S for strings). While HtPx should be a HtUx which simply casts the key pointers to ut64.

pelijah commented 8 months ago

IIRC I used zeroized options for SetP in agraph.c so pointers are treated as integers Yeah ugly hack, I know.

Rot127 commented 8 months ago

Ah nice. That is why it worked :D As long as we don't use SetP until it's refactored, it's fine (with the one exception of yours of course).

ret2libc commented 8 months ago

Wait, there are options to modify how the key should be compared. I think the doc is confusing, but it should be possible.

typedef struct Ht_(options_t) {
    HT_(ListComparator)
    cmp; // Function for comparing values. Returns 0 if eq.

In ht_inc.c:

static inline bool is_kv_equal(HtName_(Ht) * ht, const KEY_TYPE key, const ut32 key_len, const HT_(Kv) * kv) {
    if (key_len != kv->key_len) {
        return false;
    }

    bool res = key == kv->key;
    if (!res && ht->opt.cmp) {
        res = !ht->opt.cmp(key, kv->key);
    }
    return res;
}

ht_pp.c uses:

static HtName_(Ht) * internal_ht_default_new(ut32 size, ut32 prime_idx, HT_(DupValue) valdup, HT_(KvFreeFunc) pair_free, HT_(CalcSizeV) calcsizeV) {
    HT_(Options)
    opt = {
        .cmp = (HT_(ListComparator))strcmp,

so the keys are compared with strcmp, that's probably why you get some weird results. If you want to just compare the raw values (as int), don't use HtP, but use HtUx. Otherwise, create a custom Ht with .cmp set to something that makes sense for you.

wargio commented 8 months ago

yes, but the current implementation is very hard to use. i would make a proper implementation and clear definition of their usages. also HtUx does not implement any hash function therefore the table is handled as an array.

Rot127 commented 8 months ago

@wargio Interestingly not, I had a case when it used the buckets. Push the test for it in a second. Which just strengthens your point to make the functioning of Ht clear to the user.

Rot127 commented 8 months ago
bool test_htuu_uses_buckets(void) {
       HtUU *ht = ht_uu_new0();
       ht_uu_insert(ht, 0, 1);
       ht_uu_insert(ht, 1, 1);
       ht_uu_insert(ht, 2, 1);
       ht_uu_insert(ht, 3, 1);
       printf("\nht->count = %d\n", ht->count);
       ht_uu_insert(ht, 10000000, 1);
       printf("ht->count = %d\n", ht->count);
       for (int i = 0; i < ht->size; ++i) {
               printf("ht->table[%d].count = %d => [ ", i, ht->table[i].count);
               for (ut32 j = 0; j < ht->table[i].count; ++j) {
                       printf("%llu ", ht->table[i].arr[j].key);
               }
               printf("]\n");
       }

       // Output:
       // ht->count = 4
       // ht->count = 5
       // ht->table[0].count = 1 => [ 0 ]
       // ht->table[1].count = 1 => [ 1 ]
       // ht->table[2].count = 1 => [ 2 ]
       // ht->table[3].count = 2 => [ 3 10000000 ]
       // ht->table[4].count = 0 => [ ]
       // ht->table[5].count = 0 => [ ]
       // ht->table[6].count = 0 => [ ]

       mu_end;
}
ret2libc commented 8 months ago

HtUU should definitely use the bucketing.

I can improve the doc for Ht if needed as I wrote most of it. The hash function is just used to improve the way the keys are spread into buckets, so that (optimally) each bucket has mostly the same number of items. If the hash function is not provided we just use the % operator IIRC.

Rot127 commented 8 months ago

Improved Ht docs would be awesome!

XVilka commented 4 months ago

The situation is better now, with the introduction of HtSP, HtSS, HtSU, SetS.

Rot127 commented 4 months ago

Yes! Also the _new functions enforce to define the behavior if necessary. So I would consider this as closed.