zama-ai / bounty-program

Zama Bounty Program: Contribute to the FHE space and Zama's open source libraries and get rewarded 💰
https://zama.ai
231 stars 12 forks source link

Credit Scoring #8

Closed oboulant closed 1 year ago

oboulant commented 1 year ago

Zama Bounty Program: Credit Scoring

Please give us as much information as possible on the bounty you would like to submit. You can find inspiration from our existing list of bounties here.

Description

Credit Scoring

A credit score is a numerical expression based on a level analysis of a person's credit files, to represent the creditworthiness of an individual. A credit score is primarily based on a credit report, information typically sourced from credit bureaus. — Wikipedia

Introductory Brief

Credit scoring has always traditionally been reserved to banking institutions and their likes, to assess their customers likelihood to repay their credit — or to decide whether to grant a credit to a potential customer.

Users are lacking a way to assess themselves their credit score, as doing so would require them to submit their private, sensitive credit data to a third party service. This concern over the user data privacy opens a great use-case for FHE : it allows a machine-learning model to be built and to assess user credit scores, without compromising the user banking data, nor their actual credit score.

As such, this projects aims to provide a concrete, useable application that assess users credit scores, while respecting their data privacy.

Application goals

This application has 2 main goals: provide a hands-on approach to Zama FHE and showcase the working of their encryption in a more user-friendly way

Provide a hands-on experience to users

FHE is a difficult concept to grasp. Non-technical users fail to understand how it works — or how it can work, and more technical ones doubt there can be an actual working implementation beyond just a technical proof. This credit-score app is a pretext to deliver an interactive experience over FHE. As such the focus will be put on showcasing a FHE encryption rather than building a full-fledged user credit-score app (e.g. no business-model, no “premium” features…). Nonetheless the model, results and overall behaviour should be immersive enough so the users can understand that FHE is no longer a theoretical concept: FHE is ready to reshape our concept of data privacy.

Encryption showcase

The main goal of FHE applied to machine-learning is to enforce the user data privacy. Ingenuous users will probably miss the difference between FHE and HTTPS, or fail to grasp how the data privacy can be preserved server-side. This application should help non-technical users to understand the preservation of the privacy of their data.

Our take is an ingenuous user needs 2 things in order to accept a change of paradigm (in our case a new form of data privacy behaviour): a representation he can grasp, and the existence of a proof

Visual representation of the encrypted data

A first step is to show the user a visual representation of its plain submitted data, how it presents a risk for its privacy (i.e. banking/judicial data), and how readable it is. Then show him how the same data is unreadable when encrypted so the user can visualize for himself that his data is protected.

Converting the user data to base64 or showing the HTTPS encrypted messages would provide the same sense of security, although not enforcing data privacy by any mean - so this step helps to build trust through a visual medium but does not prove it.

Proof that encrypted data is unreadable by the server

Stating that the data cannot be decrypted by the server is insufficient to build trust. After helping the user visualise its data is “unreadable” the next step is to provide a technical proof in a user-friendly format (e.g. Zama documentation), which can be one of the delivery items.

It is outside the scope of this application to deliver this technical proof as Zama core team is by far the best suited to deliver this (and probably has it already, under the form of a white paper or the likes). We should focus only on referencing it for the more technical user, and providing links to any work that assess it.

Missions

The tasks are grouped in 3 categories:

Several deliverables will be produced:

The tasks will be converted into Github issues and the deliverables will be converted into Github milestones to help tracking the project development progress.

Maching Learning

Setup the ML project

Setup a machine-learning project that allows to operate credit-scoring on behalf of users. The goal here is not to develop a performant model from scratch but rather to find inspiration on existing models and quickly bootstrap a working model that is compliant with FHE model specific needs.

As to date of 06/04/2023, the main source of inspiration (models + data) is the following Kaggle competition: https://www.kaggle.com/competitions/GiveMeSomeCredit/overview

An emphasis will be put on ensuring the selected Kaggle model works and scales well with its Concrete-ML counterpart

Task Setup a credit scoring machine-learning model
Deliverable A notebook sumarizing the model, why it was selected and how it should be configured
Macro Sizing 4 days
Build the Concrete-ML equivalent
Task Convert the model from the previous step into its FHE equivalent with Concrete-ML
Deliverable A notebook sumarizing the steps of turning the development steps and showcasing a python script that can be later used for the model deployment
Macro Sizing 5 days
Performance Benchmarking

Analyse the performance between a base ML Model (e.g. scikit-learn implementation) and the built Concrete-ML counterparts. Analyse performance in terms of train/compilation time as well as prediction. The goal here is to show the difference in performance (which we foresee to be very large) but also to emphasise that this “drop” in perf is not so much of a concern at the user level as the execution time remains acceptable.

Task Analyse the model performance
Deliverable A written analyse (markdown) of the FHE model performances
Macro Sizing 3 days

Web Application

Core application

Build a web app that allows users to submit their banking data over a simple form and display a credit score result. The application should provide the following pages:

The emphasis will be put on having a quickly working example, rather than spending time on complex UX/UI. The app structure will allow to later add other functionalities.

The data will be mocked (interfacing the web app with the model will be done in a later stage, once the model is deployed).

Task Build a single-page web application for interacting with a credit-scoring distant model.
Deliverable A functional web app which works with mock data.
Macro Sizing 4 days
Visual representation of encryption

Add an intermediary step in the submission form, demonstrating to the user his data is encrypted before being sent to the server. This should be done by replacing the form “send” button to an “Encrypt” button, which redirects him to a page which demonstrates the data is encrypted. As stated in the Application Goals section, building trust at this stage is limited to showing the encrypted data. The interface will also provide links to Zama most adapted “proof” content.

If Zama has any simple visual means (e.g. infography, diagrams…) this can be included in the page.

The page will also provide a “Send to server” button to resume the flow.

Task Add an intermediary encryption page
Deliverable The updated web app with mock data.
Macro Sizing 1 day
Encryption with TFHE-rs

Replace the mock implementation of the encryption on the encryption page with the Wasm THFE-rs implementation. Depending on the encrypted data displayability (Binary?), the encrypted data visualizer might have to be adjusted (scroll, only first bits, etc…)

Task Implement the encryption with the WASM API
Deliverable The updated web app with working encryption.
Macro Sizing 3 days (depending on how the ease of using the WASM API)
Interfacing with the server

Once the server has been deployed, interface its API so that the web client effectively sends the user encrypted data and obtains results.

Build a mirror page of the previous client encryption, display the encrypted response data received from the server, and provide a “decrypt” button in the client interface. Once the data is decrypted redirect the user to the summary page which displays its credit score.

Task Interface the client with the server
Deliverable The updated web app.
Macro Sizing 2 days

Deployment

Setup the production deployment

Follow the production deployment, as described in the documentation. Depending on the practicability, we foresee the following:

These steps require some more info and will be split into more specific tasks once the complete workflow is determined.

Task Setup the production deployment
Deliverable The source code and a production deployment.
Macro Sizing 5 days

Co-written with @robinstraub

oboulant commented 1 year ago

A first step would be to validate that we can move forward with https://www.kaggle.com/competitions/GiveMeSomeCredit/overview ?

Since, from the ML perspective, the goal is not to start from scratch, but rather build upon a reasonable good enough model for that particular problem, I already had a look at what already exists related to that dataset and problem. Before going any further, it would be nice to validate that we can move forward with those data and model ⬆️ ?

Usable ressources if we validate that we work on this dataset and problem :

aquint-zama commented 1 year ago