intel / isa-l_crypto

Other
271 stars 80 forks source link

Cannot get multi-buffer hash produce valid output with HASH_FIRST/HASH_LAST #118

Closed gh-andre closed 1 year ago

gh-andre commented 1 year ago

Thank you for the library. I wonder if you can point me in the right direction in what I'm doing incorrectly with this code, which yields correct hashes only when I use a single buffer (i.e. max_ctx is set to 1).

I did look at the tests, but they are using multiple buffers to compute different hashes and my understanding was that this library can collect and prepare multiple buffers via HASH_FIRST/HASH_UPDATE in order to compute a single hash and will generate the final hash using some parallelization magic behind the scenes when either HASH_LAST is used or the final flush loop runs.

Here's the trimmed down code I experimented with. I removed all error handling for brevity. It basically checks the context error every time it gets it back from the context manager.

size_t argc_ = 7; 
const char *argv_[] = { 
    "cmd-line", 
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", 
    "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB", 
    "C", 
    "1", 
    "2", 
    "3" 
}; 

SHA256_HASH_CTX_MGR ctx_mgr = {}; 
sha256_ctx_mgr_init(&ctx_mgr); 

constexpr size_t max_ctx = 4; 
SHA256_HASH_CTX mb_ctx[max_ctx], *mb_ctx_ptr = nullptr;
size_t ctx_in_use = 0; 

for(size_t i = 0; i < max_ctx; i++) { 
    // set status in each to HASH_CTX_STS_COMPLETE 
    hash_ctx_init(&mb_ctx[i]); 
} 

for(size_t i = 1; i < argc_; i++) { 
    if(mb_ctx_ptr) 
        // runs through this with 1 2 3, mb_ctx[2] state is IDLE
        mb_ctx_ptr = sha256_ctx_mgr_submit(&ctx_mgr, mb_ctx_ptr, argv_[i], static_cast<uint32_t>(strlen(argv_[i])), HASH_UPDATE); 
    else if(ctx_in_use < max_ctx) 
        // runs through this 3 times for As, Bs and the C, mb_ctx[0]/mb_ctx[1] state is PROCESSING, mb_ctx[2] starts as IDLE
        mb_ctx_ptr = sha256_ctx_mgr_submit(&ctx_mgr, &mb_ctx[ctx_in_use++], argv_[i], static_cast<uint32_t>(strlen(argv_[i])), HASH_FIRST); 
    else { 
        // never runs this block for 4 contexts and this input, but produces valid hash for 1 context, same input
        if((mb_ctx_ptr = sha256_ctx_mgr_flush(&ctx_mgr)) == nullptr) 
            throw std::runtime_error("sha256_ctx_mgr_flush failed"); 

        if(mb_ctx_ptr->error != HASH_CTX_ERROR_NONE) 
            throw std::runtime_error(std::to_string(mb_ctx_ptr->error) + ": flush returned a bad context"); 

        mb_ctx_ptr = sha256_ctx_mgr_submit(&ctx_mgr, mb_ctx_ptr, argv_[i], static_cast<uint32_t>(strlen(argv_[i])), HASH_UPDATE); 
    } 
} 

// gets the continuation context mb_ctx[2] after HASH_UPDATE and ends the input with HASH_LAST
if(mb_ctx_ptr) { 
    if((mb_ctx_ptr = sha256_ctx_mgr_submit(&ctx_mgr, mb_ctx_ptr, nullptr, 0, HASH_LAST)) != nullptr) { 
        // docs say that if the context is returned here, it contains a valid hash (doesn't run in this case)
        print_hash("Multi-buffer", mb_ctx_ptr->job.result_digest); 
    } 
} 

if(!mb_ctx_ptr) { 
    // mb_ctx[0] and mb_ctx[1] become IDLE
    while((mb_ctx_ptr = sha256_ctx_mgr_flush(&ctx_mgr)) != nullptr) { 
        if(mb_ctx_ptr->status == HASH_CTX_STS_COMPLETE) 
            // prints valid hash for C123, mb_ctx[0] and mb_ctx[1] are not factored in
            print_hash("Multi-buffer", mb_ctx_ptr->job.result_digest);
        //else
        //    mb_ctx_ptr = sha256_ctx_mgr_submit(&ctx_mgr, mb_ctx_ptr, nullptr, 0, HASH_LAST)
    } 
} 

I experimented with ending input for each context where I called HASH_FIRST and a bunch of other configurations, but I cannot get it work for anything, but a single context, which kind of defies the whole purpose of a multi-buffer hash in my mind. It also produces multiple completed contexts, like the commented out lines at the end, which will print hashes for their contexts.

Any pointers on what's missing from this code would be greatly appreciated.

Also, unrelated to the above (considered it as an alternative initially). Am I correct in understanding that hashes produced by Multi-Hash functions (mh_sha256.h) will not be valid hashes for their nomenclature, like SHA256, because in the end they compute hashes of hashes to facilitate parallelism? If that's incorrect, I will post another question to keep it separate.

gh-andre commented 1 year ago

Ok, after wading through the code, it appears that I've been misinterpreting where parallelism is applied and that multiple hashes may be processed in an optimized way, not so much processing of a single hash that can be parallelized.

I still would appreciate your insights about multi-hashes - is it possible to produce an actual SHA-256 with a multi-hash, or it's just that it provides same cryptographic strength, but the value would be different from a SHA-256 value generated via a conventional hash function?

Thank you.