libsnark_sha256(0) != python_sha256(0)

fleupold commented 5 years ago

If I calculate sha256(0) in libsnark, I get:

0xda5698be17b9b46962335799779fbeca8ce5d491c0d26243bafef9ea1837a9d8

In python I get different values:

>>> pypy_sha256.sha256().hexdigest()
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
>>> pypy_sha256.sha256([0]*64).hexdigest()
'f5a5fd42d16a20302798ef6ed309979b43003d2320d9f0e8ea9831a92759fb4b'

I found the code that is testing the libsnark gadget with the python library. It looks like we use the pypy_sha256 in a very manual way to generate the test case:

https://github.com/scipr-lab/libsnark/blob/f7c87b88744ecfd008126d415494d9b34c4c1b20/libsnark/gadgetlib1/gadgets/hashes/sha256/tests/generate_sha256_gadget_tests.py#L39-L41

We set data manually on the state and don't call sha_final, as is the case in the digest implementation of the sha256 class.

https://github.com/scipr-lab/libsnark/blob/f7c87b88744ecfd008126d415494d9b34c4c1b20/libsnark/gadgetlib1/gadgets/hashes/sha256/tests/pypy_sha256.py#L227-L228

Why do we chose to use this manual construction instead of calling pypy_sha256.sha256(x).digest() directly? What exactly is the difference between calculating sha256 they way libsnark does and the way other libraries do?

madars commented 5 years ago

Yes, indeed! The sha256_compression_function_gadget only implements the SHA256 compression function. In many applications inputs are fixed-size and fit in one SHA256 block. For such applications, not doing the padding/finalization steps gives roughly 2x improvement in constraint system size (and thus prover runtime/memory usage). One could always chain multiple sha256_compression_function_gadget's to obtain the full SHA256; we just haven't had an application for that yet.

fleupold commented 5 years ago

Thanks for the explanation.

In my use case I have a very large number of public input parameters for a computation that I want to verify on the Ethereum blockchain. Since Ethereum transactions are very limited in size, I have to move the public input parameters into the private witness and add an extra computation step that checks the hash of private inputs matches my new short public input.

So instead of calculating f(u, w), where u is very large, I calculate f'(H, (u,w)) := f(u, w) ^ sha256(u) == H where H is very small compared to u.

For this to work I have the make sure I can calculate SHA256, the same way as it is done on the Ethereum blockchain (H will be a Merkle root store in a smart contract and needs to support inclusion proofs via a smart contracts).

On ethereum the following code:

contract SHA256 {
    event Hash(bytes32);
    function SHA256() {
        bytes32 input = 0x0;
        emit Hash(sha256(input, input));
    }
}

emits 0xf5a5fd42d16a20302798ef6ed309979b43003d2320d9f0e8ea9831a92759fb4b

Is there a way to use the existing libsnark gadget (or a slight modification) to produce that value when I call it with 512 zero bits?

It currently returns 0xda5698be17b9b46962335799779fbeca8ce5d491c0d26243bafef9ea1837a9d8

fleupold commented 5 years ago

Here is another use case that is currently using a third party gadget for calculating a finalized sha256 hash: https://github.com/barryWhiteHat/roll_up

barryWhiteHat commented 5 years ago

https://github.com/kobigurk/sha256_ethereum will match the EVM for 512 bits input. If you want to add more bits you can, you just need to edit the padding bits https://github.com/kobigurk/sha256_ethereum/blob/master/sha256_ethereum.cpp#L54

Also if you want to use this with the merkle tree gadget check out my changes to this https://github.com/barryWhiteHat/roll_up/blob/master/src/sha256/sha256_ethereum.cpp

scipr-lab / libsnark

libsnark_sha256(0) != python_sha256(0) #132