ingonyama-zk / iciclegnark

2 stars 2 forks source link

Wrong proof generated by latest iciclegnark #7

Open hussein-aitlahcen opened 1 year ago

hussein-aitlahcen commented 1 year ago

Hi,

I tried using the latest gnark fork you guys have along the latest icicle and the result, when switching from CPU to GPU proving via the gpu tag when building our binary is:

groth16.Verify on bn254 is failing for the GPU (gpu tag) generated proof: pairing doesn't match groth16.Verify on bn254 is succeeding for the non GPU generated proof: {"level":"debug","curve":"bn254","backend":"groth16","took":0.826976,"time":"2023-09-23T13:18:30+02:00","message":"verifier done"}

For the same exact circuit, with the same exact inputs.

Maybe I shouldn't be using latest commits?

GPU: image

ImmanuelSegol commented 1 year ago

@hussein-aitlahcen Thanks for opening the issue.

I will investigate.

ImmanuelSegol commented 1 year ago

@hussein-aitlahcen Im sorry I didn't ask you, what circuit are you using?

hussein-aitlahcen commented 1 year ago

@hussein-aitlahcen Im sorry I didn't ask you, what circuit are you using?

It's a circuit that is not open source yet... Perhaps I can discuss with the team to add you in the repo if you want to replay the issue.

hussein-aitlahcen commented 1 year ago

@ImmanuelSegol I tried to investigate and found that when computing krs, we don't wait for ar and bs1 to be computed, in the original gnark code computeKRS is waiting for ar and bs1 to be computed, was it supposed to be done this way? In any case I forked and fixed the issue by reordering to try but it didn't solve my problem.

I also added some logs after msm compuation of ar, bs1, bs and krs and have a discrepancy in the output:

i.e.

                // prove.go
        if _, err := bs1.MultiExp(pk.G1.B, wireValuesB, ecc.MultiExpConfig{NbTasks: n / 2}); err != nil {
            chBs1Done <- err
            close(chBs1Done)
            return
        }
        fmt.Println("bs1: ", bs1)

...
                // prove_gpu.go
        bs1, _, _ = iciclegnark.MsmOnDevice(wireValuesBDevice.P, pk.G1Device.B, wireValuesBDevice.Size, true)
        fmt.Println("bs1: ", bs1)
...
gpu

bs: {{[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]}}
ar:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
bs1:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
krs:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}

ar, bs1, krs are = 1 in mont???

non gpu

Bs:  {{[4636322392879697240 2384471692377456582 4556942363646472051 813635155070355686] [14626696712275888559 11080686283781174876 14467499977088519560 284481842883952647]} {[637172356924027360 16850950472009441986 11891712156479364010 2192150005802278260] [5751625923871861710 17730865525814601000 12803040783397251468 1691390020363999973]} {[7920156426300269433 5149651697609590255 9906526044991732446 3314290308708089264] [12279958031118287131 17661061699100988233 991386651093401781 333610601336327577]}}
bs1:  {[15403469495131510827 1175663692734111092 15688015184333491738 703810843658371249] [7964450159908730923 446238346755840542 3405801049435662475 2171517101667273338] [3575578639912884561 11501828061521654759 6382638011729568991 645800270268746503]}
ar:  {[5836583453540608800 11989822530383293069 2958706776637200154 2993799242673407613] [12235819207099940230 17748403818490228955 12165302879220095557 2885306535149423449] [389139827242955288 16545878758785633611 4447694843524427253 1808335461326242003]}
krs:  {[3213847375034341348 15167574436140310312 12611616496990658474 1685307844261831545] [14282052022244955669 14890175442922758155 9668945922477019892 937689974697692563] [3646646787137154949 9148282517108681545 4704083787257830644 835441358773014708]}

I'm compiling latest icicle using nix and gcc 11 + cuda>=12 with:

          cuda = pkgs-unstable.cudaPackages_12.cudatoolkit.overrideAttrs (old: old // { meta = null; });
          cudart = pkgs-unstable.cudaPackages_12.cuda_cudart.overrideAttrs (old: old // { meta = null; });
          pkgs.gcc11Stdenv.mkDerivation {
            pname = "icicle";
            version = "0.0.1";
            src = pkgs.fetchFromGitHub {
              owner = "ingonyama-zk";
              repo = "icicle";
              rev = "04e5ff5d1af4";
              hash = "sha256-flqfyD/r614gJPN+w/I+PksJ5gnbltLMXdMq7Vh7ziY=";
            };
            buildPhase = ''
              ${cuda}/bin/nvcc -DG2_DEFINED -Xcompiler -fPIC -std=c++17 -shared -L${cudart}/lib \
                  icicle/curves/bn254/lde.cu \
                  icicle/curves/bn254/msm.cu \
                  icicle/curves/bn254/projective.cu \
                  icicle/curves/bn254/ve_mod_mult.cu \
                  -o libbn254.so
            '';
            installPhase = ''
              mkdir -p $out/lib
              mv libbn254.so $out/lib
            '';
            enableParallelBuilding = true;
            doCheck = true;
          };

EDIT:

Looks like we ignore the commit return code also https://github.com/ingonyama-zk/iciclegnark/blob/7c37237e87aa12759dd8a2fab076ad2bfff45249/curves/bn254/gnark.go#L43

ImmanuelSegol commented 1 year ago

@ImmanuelSegol I tried to investigate and found that when computing krs, we don't wait for ar and bs1 to be computed, in the original gnark code computeKRS is waiting for ar and bs1 to be computed, was it supposed to be done this way? In any case I forked and fixed the issue by reordering to try but it didn't solve my problem.

I also added some logs after msm compuation of ar, bs1, bs and krs and have a discrepancy in the output:

i.e.

                // prove.go
      if _, err := bs1.MultiExp(pk.G1.B, wireValuesB, ecc.MultiExpConfig{NbTasks: n / 2}); err != nil {
          chBs1Done <- err
          close(chBs1Done)
          return
      }
      fmt.Println("bs1: ", bs1)

...
                // prove_gpu.go
      bs1, _, _ = iciclegnark.MsmOnDevice(wireValuesBDevice.P, pk.G1Device.B, wireValuesBDevice.Size, true)
      fmt.Println("bs1: ", bs1)
...
gpu

bs: {{[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]}}
ar:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
bs1:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
krs:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}

ar, bs1, krs are = 1 in mont???

non gpu

Bs:  {{[4636322392879697240 2384471692377456582 4556942363646472051 813635155070355686] [14626696712275888559 11080686283781174876 14467499977088519560 284481842883952647]} {[637172356924027360 16850950472009441986 11891712156479364010 2192150005802278260] [5751625923871861710 17730865525814601000 12803040783397251468 1691390020363999973]} {[7920156426300269433 5149651697609590255 9906526044991732446 3314290308708089264] [12279958031118287131 17661061699100988233 991386651093401781 333610601336327577]}}
bs1:  {[15403469495131510827 1175663692734111092 15688015184333491738 703810843658371249] [7964450159908730923 446238346755840542 3405801049435662475 2171517101667273338] [3575578639912884561 11501828061521654759 6382638011729568991 645800270268746503]}
ar:  {[5836583453540608800 11989822530383293069 2958706776637200154 2993799242673407613] [12235819207099940230 17748403818490228955 12165302879220095557 2885306535149423449] [389139827242955288 16545878758785633611 4447694843524427253 1808335461326242003]}
krs:  {[3213847375034341348 15167574436140310312 12611616496990658474 1685307844261831545] [14282052022244955669 14890175442922758155 9668945922477019892 937689974697692563] [3646646787137154949 9148282517108681545 4704083787257830644 835441358773014708]}

I'm compiling latest icicle using nix and gcc 11 + cuda>=12 with:

          cuda = pkgs-unstable.cudaPackages_12.cudatoolkit.overrideAttrs (old: old // { meta = null; });
          cudart = pkgs-unstable.cudaPackages_12.cuda_cudart.overrideAttrs (old: old // { meta = null; });
          pkgs.gcc11Stdenv.mkDerivation {
            pname = "icicle";
            version = "0.0.1";
            src = pkgs.fetchFromGitHub {
              owner = "ingonyama-zk";
              repo = "icicle";
              rev = "04e5ff5d1af4";
              hash = "sha256-flqfyD/r614gJPN+w/I+PksJ5gnbltLMXdMq7Vh7ziY=";
            };
            buildPhase = ''
              ${cuda}/bin/nvcc -DG2_DEFINED -Xcompiler -fPIC -std=c++17 -shared -L${cudart}/lib \
                  icicle/curves/bn254/lde.cu \
                  icicle/curves/bn254/msm.cu \
                  icicle/curves/bn254/projective.cu \
                  icicle/curves/bn254/ve_mod_mult.cu \
                  -o libbn254.so
            '';
            installPhase = ''
              mkdir -p $out/lib
              mv libbn254.so $out/lib
            '';
            enableParallelBuilding = true;
            doCheck = true;
          };

EDIT:

Looks like we ignore the commit return code also

https://github.com/ingonyama-zk/iciclegnark/blob/7c37237e87aa12759dd8a2fab076ad2bfff45249/curves/bn254/gnark.go#L43

@hussein-aitlahcen Is there any way you can perhaps send me or the scalars and points that are inputed into the prover or the circuits so I can run them locally?

Because if you try to run some other circuit such as here https://github.com/ingonyama-zk/gnark/tree/clean-full-gpu/examples then it will pass.

So im suspecting this is related to some sort of anomaly.

hussein-aitlahcen commented 1 year ago

@ImmanuelSegol I tried to investigate and found that when computing krs, we don't wait for ar and bs1 to be computed, in the original gnark code computeKRS is waiting for ar and bs1 to be computed, was it supposed to be done this way? In any case I forked and fixed the issue by reordering to try but it didn't solve my problem. I also added some logs after msm compuation of ar, bs1, bs and krs and have a discrepancy in the output: i.e.

                // prove.go
        if _, err := bs1.MultiExp(pk.G1.B, wireValuesB, ecc.MultiExpConfig{NbTasks: n / 2}); err != nil {
            chBs1Done <- err
            close(chBs1Done)
            return
        }
        fmt.Println("bs1: ", bs1)

...
                // prove_gpu.go
        bs1, _, _ = iciclegnark.MsmOnDevice(wireValuesBDevice.P, pk.G1Device.B, wireValuesBDevice.Size, true)
        fmt.Println("bs1: ", bs1)
...
gpu

bs: {{[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]} {[0 0 0 0] [0 0 0 0]}}
ar:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
bs1:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}
krs:  {[15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [15230403791020821917 754611498739239741 7381016538464732716 1011752739694698287] [0 0 0 0]}

ar, bs1, krs are = 1 in mont???

non gpu

Bs:  {{[4636322392879697240 2384471692377456582 4556942363646472051 813635155070355686] [14626696712275888559 11080686283781174876 14467499977088519560 284481842883952647]} {[637172356924027360 16850950472009441986 11891712156479364010 2192150005802278260] [5751625923871861710 17730865525814601000 12803040783397251468 1691390020363999973]} {[7920156426300269433 5149651697609590255 9906526044991732446 3314290308708089264] [12279958031118287131 17661061699100988233 991386651093401781 333610601336327577]}}
bs1:  {[15403469495131510827 1175663692734111092 15688015184333491738 703810843658371249] [7964450159908730923 446238346755840542 3405801049435662475 2171517101667273338] [3575578639912884561 11501828061521654759 6382638011729568991 645800270268746503]}
ar:  {[5836583453540608800 11989822530383293069 2958706776637200154 2993799242673407613] [12235819207099940230 17748403818490228955 12165302879220095557 2885306535149423449] [389139827242955288 16545878758785633611 4447694843524427253 1808335461326242003]}
krs:  {[3213847375034341348 15167574436140310312 12611616496990658474 1685307844261831545] [14282052022244955669 14890175442922758155 9668945922477019892 937689974697692563] [3646646787137154949 9148282517108681545 4704083787257830644 835441358773014708]}

I'm compiling latest icicle using nix and gcc 11 + cuda>=12 with:

          cuda = pkgs-unstable.cudaPackages_12.cudatoolkit.overrideAttrs (old: old // { meta = null; });
          cudart = pkgs-unstable.cudaPackages_12.cuda_cudart.overrideAttrs (old: old // { meta = null; });
          pkgs.gcc11Stdenv.mkDerivation {
            pname = "icicle";
            version = "0.0.1";
            src = pkgs.fetchFromGitHub {
              owner = "ingonyama-zk";
              repo = "icicle";
              rev = "04e5ff5d1af4";
              hash = "sha256-flqfyD/r614gJPN+w/I+PksJ5gnbltLMXdMq7Vh7ziY=";
            };
            buildPhase = ''
              ${cuda}/bin/nvcc -DG2_DEFINED -Xcompiler -fPIC -std=c++17 -shared -L${cudart}/lib \
                  icicle/curves/bn254/lde.cu \
                  icicle/curves/bn254/msm.cu \
                  icicle/curves/bn254/projective.cu \
                  icicle/curves/bn254/ve_mod_mult.cu \
                  -o libbn254.so
            '';
            installPhase = ''
              mkdir -p $out/lib
              mv libbn254.so $out/lib
            '';
            enableParallelBuilding = true;
            doCheck = true;
          };

EDIT: Looks like we ignore the commit return code also https://github.com/ingonyama-zk/iciclegnark/blob/7c37237e87aa12759dd8a2fab076ad2bfff45249/curves/bn254/gnark.go#L43

@hussein-aitlahcen Is there any way you can perhaps send me or the scalars and points that are inputed into the prover or the circuits so I can run them locally?

Because if you try to run some other circuit such as here https://github.com/ingonyama-zk/gnark/tree/clean-full-gpu/examples then it will pass.

So im suspecting this is related to some sort of anomaly.

Discussed with the team we can add you to the repo, do you have matrix handle?

jeremyfelder commented 1 year ago

@hussein-aitlahcen Thanks for finding the KRS not waiting on AR and BS1 bug! 💪🏻 🚀.

Just so you know, the setup phase moves all of the points and some additional data thats needed to the GPU immediately. I see that the GPU you are using has 6141MiB of mem; depending on the circuit size its possible that there isn't enough memory so when one of the functions tries to use an invalid pointer the GPU errors and can't recover.

You should be able to see if this is the case by running nvidia-smi with -lms and other options to print memory usage or if the total prover time is extremely low (less than 500ms)