radareorg / ideas

4 stars 1 forks source link

Implement RHash type-subtype to select checksum variants #212

Open radare opened 6 years ago

radare commented 6 years ago

related to https://github.com/radare/radare2/pull/9122

vertur commented 6 years ago

My thoughts:

1. Type-Subtype subdivision:

Type = algorithm family with the same context. Subtype = algorithm presets (parameterization) if any.

2. Unified algorithm description like this:

struct RHashDesc {
    const char * name;
    const char * subname;
    size_t digestsize;
    void * presets;
    void (* init)(struct ctx *, const void * presets);
    void (* update)(struct ctx *, const void * buf, size_t size);
    struct digest * (* fini)(struct ctx *);
    void (*fprint)(FILE *, struct digest *);
};

3. Enumerate all subtypes, and perform hash computation sequence:

struct RHashDesc * r_hash_desc_by_index(int index);

for (i = 0; (rhash = r_hash_desc_by_index(i)) != NULL; ++i) {
    printf("%s/%s\n", rhash->name, rhash->subname);
    r_hash_enable(ctx, i);
    ...
    r_hash_calc(ctx, rhash, buf, size);
    r_hash_disable(ctx, i);
}

void r_hash_calc(struct ctx *, struct RHashDesc * rhash, const void * buf, size_t size) {
    rhash->init(ctx, rhash->presets);
    rhash->update(ctx, buf, size);
    ...
    rhash->update(ctx, buf2, size2);
    ...
    rhash->fprint(stdout, rhash->fini(ctx));
}

4. Bitset limited space problem from integral types:

flags = r_hash_name_to_bits(comma_names);
r_hash_new(..., ut64 flags);
r_hash_do_begin(struct ctx *, ut64 flags)
r_hash_do_end(struct ctx * , ut64 flags);

ut64 => 64 bits, 64 hash algorithms at the max. Switching to ut128 does not solve problem at all.

To solve it, we can implement bitset array inside context itself:

void r_hash_enable(struct ctx *, int index);
void r_hash_disable(struct ctx *, int index);
bool r_hash_is_enabled(const struct ctx *, int index);

struct ctx * ctx = r_hash_new(/* no bit flags*/);
while (*comma_names != 0) {
    for (i = 0; (rhash = r_hash_desc_by_index(i)) != NULL; ++i) {
        if ((eat = match_name(comma_names, rhash->name, rhash->subname)) > 0)  {
            r_hash_enable(ctx, i);
            comma_names += eat;
        }
    }
}
...
for (i = 0; (rhash = r_hash_desc_by_index(i)) != NULL; ++i) {
    if (r_hash_is_enabled(ctx, i)) {
        r_hash_calc(ctx, rhash, buf, size);
    }
}

5. Exploiting builtin bitset to enumerate types only:

for (i = 0; (rhash = r_hash_desc_by_index(i)) != NULL; ++i) {
    if (!r_hash_is_enabled(ctx, i)) {
        printf("%s\n", rhash->name);
        r_hash_enable(ctx, i);
    }
}
vertur commented 6 years ago

I may propose a new hashing API:

/* algo descruiption by index */
struct RHashDesc * r_hash_desc_by_index(int index);

/* common context constructor/destructor */
struct RHash * r_hash_new();
void r_hash_free(struct RHash *);

/* algorithm selection */
void r_hash_enable_by_index(struct RHash *, int index);
void r_hash_enable_by_names(struct RHash *, const char * comma_list_name_patterns);
bool r_hash_is_enabled_by_index(struct RHash *, int index);
void r_hash_disable_by_index(struct RHash *, int index);

/* hash computation sequence for all enabled algorithms */
void r_hash_init(struct RHash *);
void r_hash_update(struct RHash *, const void * buf, size_t size);
void r_hash_fini(struct RHash *);

/* quering result digest */
size_t r_hash_digest_by_index(struct RHash *, int index, void * digest);
void r_hash_fprint(struct RHash *, FILE * file);

So all hashing internals are hidden from the user. It is still possible to have simultaneously hashing with several algorithms and select them one-by-one by known index/name or at-once by comma list names. For more flexibility we can use name patterns instead of names.

/* select md5 and all known from crc32 family */
r_hash_enable_by_names(struct RHash *, "md5,crc32*");

Enumeration all known algorithms is trivial, but it brings to user a lot of names. The last problems are "enumeration algo types only" and "enumeration all subtypes for given type". To solve the first problem we can exploit r_hash_enable+r_hash_is_enabled API functions; and the solution for the second is not hard too - we can compare type name with known name during enumeration through all known algorithms.

radare commented 6 years ago

i think the proposal is too verbose. lets move the discussion to:

https://hackmd.io/IwYwhgbArMBmDsBaADAE2EgLMztEE4BTADgCZERND8QRSRUIAjYoA=== https://hackmd.io/IwYwhgbArMBmDsBaADAE2EgLMztEE4BTADgCZERND8QRSRUIAjYoA===

On 8 Jan 2018, at 23:40, vertur notifications@github.com wrote:

I may propose a new hashing API:

/ algo descruiption by index / struct RHashDesc * r_hash_desc_by_index(int index);

/ common context constructor/destructor / struct RHash r_hash_new(); void r_hash_free(struct RHash );

/ algorithm selection / void r_hash_enable_by_index(struct RHash , int index); void r_hash_enable_by_names(struct RHash , const char comma_list_name_patterns); bool r_hash_is_enabled_by_index(struct RHash , int index); void r_hash_disable_by_index(struct RHash *, int index);

/ hash computation sequence for all enabled algorithms / void r_hash_init(struct RHash ); void r_hash_update(struct RHash , const void buf, size_t size); void r_hash_fini(struct RHash );

/ quering result digest / size_t r_hash_digest_by_index(struct RHash , int index, void digest); void r_hash_fprint(struct RHash , FILE file); So all hashing internals are hidden from the user. It is still possible to have simultaneously hashing with several algorithms and select them one-by-one by known index/name or at-once by comma list names. For more flexibility we can use name patterns instead of names.

/ select md5 and all known from crc32 family / r_hash_enable_by_names(struct RHash , "md5,crc32"); Enumeration all known algorithms is trivial, but it brings to user a lot of names. The last problems are "enumeration algo types only" and "enumeration all subtypes for given type". To solve the first problem we can exploit r_hash_enable+r_hash_is_enabled API functions; and the solution for the second is not hard too - we can compare type name with known name during enumeration through all known algorithms.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/9148#issuecomment-356119754, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lprc-DHOggR8hv1wKHmckSHLR4Jxks5tIplngaJpZM4RVwox.

ret2libc commented 4 years ago

This issue has been moved from radareorg/radare2 to radareorg/ideas as we are trying to clean our backlog and this issue has probably been created a long while ago. This is an effort to help contributors understand what are the actionable items they can work on, prioritize issues better and help users find active/duplicated issues more easily. If this is not an enhancement/improvement/general idea but a bug, feel free to ask for re-transfer to main repo. Thanks for your understanding and contribution with this issue.