usnistgov / ACVP

Industry Working Group on Automated Cryptographic Algorithm Validation
https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program
162 stars 65 forks source link

SHA1/SHA2 Alternate MCT Test Pseudo Code #1482

Closed AlexThurston closed 9 months ago

AlexThurston commented 9 months ago

This is in reference to the pseudocode provided for the alternate MCT test

https://pages.nist.gov/ACVP/draft-celi-acvp-sha.html#section-6.2-5

Perhaps this is widely known and a result of in-experience, but the code provided describing the alternative MCT test is incorrect or misleading.

What's provided is:

For j = 0 to 99
    A = B = C = SEED
    For i = 0 to 999
        MSG = A || B || C
        if LEN(MSG) >= LEN(SEED):
            MSG = leftmost LEN(SEED) bits of MSG
        else:
            MSG = MSG || LEN(SEED) - LEN(MSG) 0 bits
        MD = SHA(MSG)
        A = B
        B = C
        C = MD
    Output MD
    SEED = MD

In the final line, the SEED value for the next iteration is being set to the resultant MD after then 1000 iterations of the internal loop. However, this isn't correct. While the test results expects that full MD within the JSON, the SEED value is actually the just MD up to the length of the original SEED, ie. the final line should read something like SEED = MD[:len(SEED)]

jbrock24 commented 9 months ago

Hi @AlexThurston,

I'm a bit confused as to the problem, could you please explain it to me if after reading below is still incorrect.

At the beginning of loop j we assign A, B, and C to the SEED. On the first iteration it's going to be whatever SEED was going into the method, but on each iteration of loop j afterward, it's assigned to the MD of the SHA's hashed return MSG. At this point there's no need to assign the message (SEED) to the digest's data up to the length, i.e., SEED = MD[:len(SEED)], because we do the process again at the beginning of loop j where the new SEED is assigned to A, B, and C (though C was assigned at the end of loop I. At this point, A, B, and C are now MD, and we combine those for the new MSG. We then check the length of the MSG and either shrink or grow it to the size of the length of the initial SEED.

If this isn't correct, please help me understand so I can fix the issue, thanks!

-Joel

AlexThurston commented 9 months ago

Hey @jbrock24

Let me see if I can describe what I was noticing with actual examples and maybe that'll clear things up.

In the below example, I had registered SHA1. The test session is 464085 and the vector set is 2017905. This is from test group 2 and tcId 513

My MCT test starts with a value for msg of E72EA5FF94DF7338945D9ABCFD9C994D. This is 16 bytes long and that becomes the value of SEED. I enter the first loop's iteration. I then enter the second loop of 1000 for the first time which results in a final MD. This MD, if done correctly is B32F751AA6CB861DDD478A4B1B56A4420EE9E2FA which is 20 bytes long. Just prior to starting the second iteration of the outer loop, the pseudocode does SEED = MD (or SEED = B32F751AA6CB861DDD478A4B1B56A4420EE9E2FA), so now the SEED value is 20 bytes long instead of 16. If we do this, the subsequent calculated MD values of the inner loop which go into the resultsArray will not match those in the expected responses. In order for my response to match those of the expected responses, at the end of the outer array and the final line of the pseudocode (SEED=MD), SEED has to be set to the MD but only the first 16 bytes, not the entire 20 (or specifically SEED = B32F751AA6CB861DDD478A4B1B56A442). If this is done, then my calculated responses match those in the expected responses.

I have no idea if this actually clears anything up, or really just makes things more complicated.

AlexThurston commented 9 months ago

OOoo. I think I see where/why the pseudocode is incorrect compared to the actual implementation. I was looking at the actual server code. The seed length is being calculated at the begging and used here. This is calculated once but used each and every time. The the documented pseudocode here. That isn't the case. It is using len(SEED). That shouldn't be. The len(SEED) should be done at the beginning outside of the outer loop. I think it should look like this:

INITIAL_SEED_LENGTH = LEN(SEED)
For j = 0 to 99
    A = B = C = SEED
    For i = 0 to 999
        MSG = A || B || C
        if LEN(MSG) >= INITIAL_SEED_LENGTH:
            MSG = leftmost INITIAL_SEED_LENGTH bits of MSG
        else:
            MSG = MSG || INITIAL_SEED_LENGTH - LEN(MSG) 0 bits
        MD = SHA(MSG)
        A = B
        B = C
        C = MD
    Output MD
    SEED = MD
jbrock24 commented 9 months ago

Thanks for pointing this out! It was definitely an issue, fixed it, thanks again!