Implement Keccak transcript

arnaucube commented 1 year ago

Implement the Transcript for Keccak. For the use cases in which we don't need to replicate the transcript in-circuit, we can use the KeccakTranscript which will be more performant than the PoseidonTranscript.

grandchildrice commented 1 year ago

Hi @arnaucube I'm 0xvon.

I'm interested in this work.

Do you have any idea if we should use the library or it is better to implement it on our own on keccak? Poseidon is implemented from the arkworks' library.

arnaucube commented 1 year ago

That's great! ^^

For keccak, I guess that two candidates to be used would be tiny-keccak and sha3's keccak256 (to make it compatible with Ethereum's keccak256 version for future potential compatibilities).

@CPerezz and @han0110 have more experience with this for sure, so they will able to provide better guidance on this.

CPerezz commented 1 year ago

Hi @arnaucube I'm 0xvon.

I'm interested in this work.

Do you have any idea if we should use the library or it is better to implement it on our own on keccak? Poseidon is implemented from the arkworks' library.

Hey thanks for offering to do this!

The first thing I'd do is to check whether there are good options within the arkworks ecosystem already. Once this is done, I'd check how feasible is to depend on it directly, or if it requires vendoring or whatever.

Finally, if there are no good options, then we can implement our own based on the libs @arnaucube mentioned as they're indeed ETH compatible. And then, you'd just need to kinda follow the same impl that @arnaucube did for Poseidon but for the circuit it would need to be you who codes it (I think then it would be better to get Circom's keccak and transpile it to here as otherwise this can be a really tough task..)

arnaucube commented 1 year ago

Note that the idea for the Keccak transcript is to implement only the non-circuit one (the 'native' implementation), to use it for the cases where we don't need to replicate it in-circuit. The in-circuit keccak in R1CS would be around 150k constraints for each keccak, which would be too much for our folding iterations. That's why for the cases in which we need in-circuit transcript we will use the Poseidon transcript which contains both in-circuit and native implementations.

For example, the microsoft/Nova impl uses the Keccak transcript to generate the challenges as that logic is not needed in-circuit, but for our HyperNova impl we would use the Poseidon transcript since we would need to replicate the challenges generation in-circuit at least for the SumCheck verification.

So the task of this issue would be implementing the Transcript trait for the Keccak256 hash function (as done for the PoseidonTranscript (notice that this is not the PoseidonTranscriptVar (in-circuit), but just the 'native' one)).

grandchildrice commented 1 year ago

@CPerezz

Thanks for the reply. I understand what we should do at first. Now I'll wait for the results of your survey to see if we should implement the option. Please feel free to let me know if there is anything I can do to help you with your survey.

@arnaucube

Thank you for the supplemental information. I understand that I will only scope the implementation of the Transcript Trait in this issue.

grandchildrice commented 1 year ago

I am currently researching several keccak libraries. I would like to ask you a few questions in the course of my research.

First, I would like to share my perception of the requirements for Transcript. Is it necessary to have a Sponge Hash construct in the transcript? PoseidonTranscript is implemented based on sponge, should keccak do the same?

grandchildrice commented 1 year ago

Let me share some findings about the survey.

1. Survey of keccak implements in the arkwork ecosystem

As I looked around arkwork ecosystem, I didn't found that there are any keccak library. ark-circom depends on tiny-keccak.

2. Survey of other keccak libraries

I found that tiny-keccak and SHA3 are the major keccak libraries (as @arnaucube mentioned)

2-1. tiny-keccak

I wrote the example with tiny-keccak as follows. tiny-keccak is so simple. We only need to use Keccak struct. The Keccak has only two functions, update() and finalize(). According to the comments, update() is absorb() and finalize() is squeeze() and pad().

As you can see in the example implementation, it is necessary to convert the arkworks type in the transcript implementation. Also, as a concern, the following functions may not be supported.

fn get_challenge_nbits(&mut self, nbits: usize) -> Vec<bool>;
fn get_challenges(&mut self, n: usize) -> Vec<C::ScalarField>;

Example of Keccak Transcript with tiny-keccak (without test)

use std::marker::PhantomData;
use tiny_keccak::{Keccak, Hasher};
use ark_ec::{AffineRepr, CurveGroup};
use ark_ff::{BigInteger, Field, PrimeField};

use crate::transcript::Transcript;

/// KecccakTranscript implements the Transcript trait using the Keccak hash
pub struct KeccakTranscript<C: CurveGroup> {
    sponge: Keccak,
    phantom: PhantomData<C>,
}

#[derive(Debug)]
pub struct KeccakConfig {}

impl<C: CurveGroup> Transcript<C> for KeccakTranscript<C> {
    type TranscriptConfig = KeccakConfig;
    fn new(config: &Self::TranscriptConfig) -> Self {
        let _ = config;
        let sponge = Keccak::v256();
        Self {
            sponge,
            phantom: PhantomData,
        }
    }

    fn absorb(&mut self, v: &C::ScalarField) {
        self.sponge.update(&(v.into_bigint().to_bytes_le()));
    }
    fn absorb_vec(&mut self, v: &[C::ScalarField]) {
        // TODO
    }
    fn absorb_point(&mut self, p: &C) {
        self.sponge.update(&prepare_point(p))
    }

    fn get_challenge(&mut self) -> C::ScalarField {
        let mut output = [0u8; 32];
        self.sponge.clone().finalize(&mut output);
        self.sponge.update(&[output[0]]);
        C::ScalarField::from_le_bytes_mod_order(&[output[0]])
    }
    fn get_challenge_nbits(&mut self, nbits: usize) -> Vec<bool> {
        // TODO
        vec![]
    }
    fn get_challenges(&mut self, n: usize) -> Vec<C::ScalarField> {
        let mut output = [0u8; 32];
        self.sponge.clone().finalize(&mut output);
        self.sponge.update(&[output[0]]);

        let c = output
            .iter()
            .map(|c| C::ScalarField::from_le_bytes_mod_order(&[*c]))
            .collect();
        c
    }
}

// Returns the point coordinates in Fr, so it can be absrobed by the transcript. It does not work
// over bytes in order to have a logic that can be reproduced in-circuit.
fn prepare_point<C: CurveGroup>(p: &C) -> Vec<u8> {
    let binding = p.into_affine();
    let p_coords = &binding.xy().unwrap();
    let x_bi = p_coords
        .0
        .to_base_prime_field_elements()
        .next()
        .expect("a")
        .into_bigint()
        .to_bytes_le();
    let mut y_bi = p_coords
        .1
        .to_base_prime_field_elements()
        .next()
        .expect("a")
        .into_bigint()
        .to_bytes_le();

    y_bi.extend(x_bi);
    y_bi
}

#[cfg(test)]
pub mod tests {
    use super::*;
    use ark_pallas::{
        // constraints::GVar,
        Fr, Projective
    };

    /// WARNING the method poseidon_test_config is for tests only
    #[cfg(test)]
    pub fn keccak_test_config<F: PrimeField>() -> KeccakConfig {
        KeccakConfig {}
    }

    #[test]
    fn test_transcript_and_transcriptvar_get_challenge() {
        // use 'native' transcript
        let config = keccak_test_config::<Fr>();
        let mut tr = KeccakTranscript::<Projective>::new(&config);
        tr.absorb(&Fr::from(42_u32));
        let c = tr.get_challenge();

        // TODO
        // assert_eq!();
    }
}

2-2. SHA3

SHA3 has Shake256, which calculates variable length hash. We can you it to call absorb() and squeeze() as well as tiny-keccak.

Example of Keccak Transcript with SHA3 (without test)

use std::marker::PhantomData;
use sha3::{Shake256, digest::*};
use ark_ec::{AffineRepr, CurveGroup};
use ark_ff::{BigInteger, Field, PrimeField};

use crate::transcript::Transcript;

/// KecccakTranscript implements the Transcript trait using the Keccak hash
pub struct SHA3Transcript<C: CurveGroup> {
    sponge: Shake256,
    phantom: PhantomData<C>,
}

#[derive(Debug)]
pub struct SHA3Config {}

impl<C: CurveGroup> Transcript<C> for SHA3Transcript<C> {
    type TranscriptConfig = SHA3Config;
    fn new(config: &Self::TranscriptConfig) -> Self {
        let _ = config;
        let sponge = Shake256::default();
        Self {
            sponge,
            phantom: PhantomData,
        }
    }

    fn absorb(&mut self, v: &C::ScalarField) {
        self.sponge.update(&(v.into_bigint().to_bytes_le()));
    }
    fn absorb_vec(&mut self, v: &[C::ScalarField]) {
        for _v in v {
            self.sponge.update(&(_v.into_bigint().to_bytes_le()));
        }
    }
    fn absorb_point(&mut self, p: &C) {
        self.sponge.update(&prepare_point(p))
    }

    fn get_challenge(&mut self) -> C::ScalarField {
        let output = self.sponge.clone().finalize_boxed(200);
        self.sponge.update(&[output[0]]);
        C::ScalarField::from_le_bytes_mod_order(&[output[0]])
    }
    fn get_challenge_nbits(&mut self, nbits: usize) -> Vec<bool> {
        // TODO
        // should call finalize() then slice the output to n bit challenge
        vec![]
    }
    fn get_challenges(&mut self, n: usize) -> Vec<C::ScalarField> {
        let output = self.sponge.clone().finalize_boxed(n);
        self.sponge.update(&[output[0]]);

        let c = output
            .iter()
            .map(|c| C::ScalarField::from_le_bytes_mod_order(&[*c]))
            .collect();
        c
    }
}

// Returns the point coordinates in Fr, so it can be absrobed by the transcript. It does not work
// over bytes in order to have a logic that can be reproduced in-circuit.
fn prepare_point<C: CurveGroup>(p: &C) -> Vec<u8> {
    let binding = p.into_affine();
    let p_coords = &binding.xy().unwrap();
    let x_bi = p_coords
        .0
        .to_base_prime_field_elements()
        .next()
        .expect("a")
        .into_bigint()
        .to_bytes_le();
    let mut y_bi = p_coords
        .1
        .to_base_prime_field_elements()
        .next()
        .expect("a")
        .into_bigint()
        .to_bytes_le();

    y_bi.extend(x_bi);
    y_bi
}

#[cfg(test)]
pub mod tests {
    use super::*;
    use ark_pallas::{
        // constraints::GVar,
        Fr, Projective
    };

    /// WARNING the method poseidon_test_config is for tests only
    #[cfg(test)]
    pub fn keccak_test_config<F: PrimeField>() -> SHA3Config {
        SHA3Config {}
    }

    #[test]
    fn test_transcript_and_transcriptvar_get_challenge() {
        // use 'native' transcript
        let config = keccak_test_config::<Fr>();
        let mut tr = SHA3Transcript::<Projective>::new(&config);
        tr.absorb(&Fr::from(42_u32));
        let c = tr.get_challenge();

        // TODO
        // assert_eq!();
    }
}

grandchildrice commented 1 year ago

Survey updated.

As conclusion,

arkworks doesn't have keccak implementation.
tiny-keccak/sha3 has minimal functions for keccak hashes.
- we can use those libs with some additional codings.

grandchildrice commented 1 year ago

@CPerezz @arnaucube

Hi. I implemented two transcripts and created a PR. Currently both transcripts work on Pedersen Circuit's test. Please check it and let's discuss which to use (or use other options). And then let me know what should I do next!

https://github.com/privacy-scaling-explorations/folding-schemes/pull/27

CPerezz commented 1 year ago

Will take a look to this on Monday/Tuesday.

So far, looks really cool! Thanks a lot for taking the lead on this effort! Will try to review ASAP as said! <3

privacy-scaling-explorations / sonobe