privacy-scaling-explorations / acceleration-program

Accelerate Early Stage Programmable Cryptography Talents
92 stars 7 forks source link

Proposal: ZK-friendly ML model explorations #16

Closed saeyoon17 closed 5 months ago

saeyoon17 commented 11 months ago

General Grant Proposal

Project Overview :page_facing_up:

Overview

This task explores different zk-applicable machine learning techniques and compare them.

Project Details

Throughout the project, we explore different zk-applicable machine learning algorithms that can perform the Heart Failure Prediction Dataset.

Specifically, we target to explore

I plan to compare the folloings:

Team :busts_in_silhouette:

Team members

Team Website

Team's experience

Team Code Repos

Development Roadmap :nut_and_bolt:

Overview

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can train a neural network, using the heart failure dataset and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using neural network. The model architecture is to be determined where I plan to start with simple MLP and expand.
  2. Functionality: Converting neural network model to ZK circuits using Circom or EZKL.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

    Milestone 2️⃣: Training/Proof generation using Linear Regression

    • Estimated Duration: 2 weeks
    • FTE: 0.5

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using linear regression using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using linear regression.
  2. Functionality: Converting linear regression model to ZK circuits using Circom or EZKL.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Milestone 3️⃣: Training/Proof generation using Decision Tree

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using decision tree using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using decision tree.
  2. Functionality: Converting decision tree to ZK circuits using Circom/EZKL/zkML.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Milestone 4️⃣: Training/Proof generation using kNN / Final report

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using kNN using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

0b. Final report - We plan to write down the final reports on observed models, where we compare the followings:

  1. Functionality: Train/Test/Inference pipeline using kNN.
  2. Functionality: Converting kNN to ZK circuits using Circom/EZKL/zkML.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Additional Information :heavy_plus_sign:

Plans on converting models to ZK circuits

I am planning to first construct each model using pytorch and try EZKL. Yet if the operations are unimplemented, I am planning to look for other conversion methods, or construct circom circuit on my own.

Relevant works

NOOMA-42 commented 11 months ago

@socathie Would you kindly review this proposal

socathie commented 11 months ago

@saeyoon17 Thank you for your proposal. Your previous work on torch2circom shows that you are a good fit for this project. However, I'm worried that the Iris dataset is too low-dimensional (only 4 features) for the comparison/benchmarking to be meaningful. Hence, may I suggest some possible modifications:

  1. Choose a slightly more complicated dataset, one of more features and a bigger sample size; OR
  2. Focus on less "advanced" ML algorithms that are more suitable for this problem, more comparable in terms of complexity and performance, and less explored in previous ZKML implementations, e.g. decision tree (already proposed), kNN, SVD, LR, etc.

On the other hand, the deliverables will need to be more well-defined and details. Here is an example I had from when I did the grant on circomlib-ml and ZKaggle:

Milestone 1 Full-feature circomlib-ml Deliverables: 0a. Documentation - We will provide both inline documentation of the code and a basic tutorial that explains how a user can (for example) spin up the application. 0b. Testing Guide - The code will have proper unit-test coverage (e.g. 90%) to ensure functionality and robustness. In the guide we will describe how to run these tests

  1. Functionality: Full strides compatibility in current layers - We will rewrite some current templates in circomlib-ml, e.g. adding strides compatibility to Conv2D, so that they will be fully compatible with current tensorflow standards
  2. Functionality: Flatten - We will write a circom template that will flatten a multidimensional input into a one-dimensional vector.
  3. Functionality: Dropout/Normalization - Dropout (and other regularization layers such as batch normalization) is one of the most common layers used in SOTA neural networks. Adding them will make the library more complete
  4. Functionality: Encrypt/decrypt - ECDH encryption and decryption templates will be added to circomlib-ml to enable encryption of model weights in further applications.
  5. BONUS Functionality: Proof aggregation - We will explore the possibility of aggregating multiple evaluation proofs into one using the recent zkPairing development.
  6. Application - All newly added templates will come together to form a more accurate model on the MNIST dataset than the current one hosted on https://zk-ml.netlify.app/

Of course, given the scope of your proposal, your deliverables will be very different. This is just to give an idea of the level of detail we want. Let me know if you have any questions!

saeyoon17 commented 11 months ago

@socathie Thanks! I will make sure to revise the proposal soon. :)

saeyoon17 commented 11 months ago

@socathie Hi Cathie! I edited the proposal. Could you kindly take a look at it? Tell me if anything else is insufficient. Thank you!

NOOMA-42 commented 11 months ago

@socathie Hi Cathie! I edited the proposal. Could you kindly take a look at it? Tell me if anything else is insufficient. Thank you!

Looks good content wise, I'll follow up with FTE/Cost internally. Will keep you update

NOOMA-42 commented 10 months ago

@saeyoon17 I've removed the pricing rate from proposal. Pricing rate will be processed internally and will not be revealed reveal to public.

adrianmcli commented 10 months ago

This looks good to me!