ML-DSA keyGen reuses Shake256 output

scottc4 commented 3 months ago

environment Demo

testSessionId 500178

vsId 2254620

Algorithm registration [ { "algorithm": "ML-DSA", "mode": "keyGen", "revision": "FIPS204", "parameterSets": [ "ML-DSA-44", "ML-DSA-65", "ML-DSA-87" ] } ]

Endpoint in which the error is experienced Demo

Expected behavior Test Case 41 for ML-DSA-65 specifies a seed value of 9d20abd2a2d3b9313ee70aabb7c8bd80d262cedd6c8b5bfba0cbcae1cbd31d70 then byte order reversal of the bytes yields value to use of 701DD3CBE1CACBA0FB5B8B6CDDCE62D280BDC8B7AB0AE73E31B9D3A2D2AB209D.

However, running this test case I get the following differences in the generated private key value: Got 0x08 instead of 0x51 at offset 1406 Got 0x18 instead of 0x35 at offset 1407 Got 0xC7 instead of 0x57 at offset 3609 Got 0xFE instead of 0x08 at offset 3611 Got 0x09 instead of 0x4A at offset 3612 Got 0xBC instead of 0xBB at offset 3613 Got 0xA6 instead of 0xB6 at offset 3614

The first two differences are in the 5th of the 6 128 byte vectors of s2: 513508820…185525282755135 Since this repeats the first 2 bytes of the vector, it looks like the output of the Shake256 digest is reused instead of getting the next N bytes.

celic commented 3 months ago

I see #320 references needing to reverse the bytes to get testing working.

Would you like me to provide our computed s2 value? Would that help diagnose the issue?

scottc4 commented 3 months ago

No, if I don't reverse the seed bytes then the entire private key value is different. Reversing the bytes yields a private key value that matches except for the 7 bytes as mentioned above. If I change my code to reuse the Shake256 data instead of requesting the next buffer size of output, then the generated private key value matches the expected value provided by the ACVP server.

eay commented 3 months ago

I can provide more information. For this particular test set, the issue is the generation of the 5of6 vector for s2. Algorithm 27 from FIPS204, line 5. This is then calling Algorithm 25, and due to the rejection sampling, we need a variable number of input bytes (from shake256). This is H(p)[[c]] (line 4 of Algorithm 25). For this particular test case, and only this one in the test data set, c increases to a value greater than 256. If we change our code, so that we only generate 256 bytes from H(p), then reuse them once c gets larger that 256 (c % 256) we get the same result as the test case. If we keep our current code, where we continue to generate new bytes from H(p) (shake256), we get a different result. As a cross check to our supposition that the H(c) function is reusing it's first 256 bytes, the output vector value in the test data set, has the same last 2 bytes (that we disagree on) as the first 2 bytes, which is consistent with the H(p) values being recycled after 256 bytes.

celic commented 2 months ago

Interesting. That's very helpful. Obviously that's not the intention of the algorithm to re-use bytes from the XOF.

I think my mistake is here: https://github.com/usnistgov/ACVP-Server/blob/master/gen-val/src/crypto/src/NIST.CVP.ACVTS.Libraries.Crypto/Dilithium/Dilithium.cs#L945

            // If we miss enough checks, we need to squeeze more bits, more efficient to do in a batch than per-byte
            if (c == 256)
            {
                c = 0;
                squeezeFactor++;
                var tempZ = new byte[256 * squeezeFactor];

                _h.Squeeze(tempZ, 256 * 8 * squeezeFactor);
                zCandidates = tempZ[..256];
            }

The goal was to increment the squeezeFactor whenever we passed the amount of bytes stored, so we don't re-run the squeeze operation for every single byte.

Here tempZ is the output of H(p)[[c]], and we are taking the last 256-bytes. Or so I thought... This actually takes the first 256 bytes. "Get me indexes of the array until you reach index 256," is what this notation means. We want the last 256-bytes because we have already used the rest of the squeezed output earlier.

Thank you @eay for the pointer to Algorithm 25. This error was an immediate find because of that. I'll fix this.

celic commented 2 months ago

To complete the loop, the correct line is zCandidates = tempZ[(256 * (squeezeFactor - 1))..]; to get the last 256-bytes of the squeezed output.

livebe01 commented 2 months ago

The fix for this issue is on Demo as of this afternoon's v1.1.0.34 hotfix deployment.

usnistgov / ACVP-Server

ML-DSA keyGen reuses Shake256 output #321