Closed oboulant closed 1 year ago
A first step would be to validate that we can move forward with https://www.kaggle.com/competitions/GiveMeSomeCredit/overview ?
Since, from the ML perspective, the goal is not to start from scratch, but rather build upon a reasonable good enough model for that particular problem, I already had a look at what already exists related to that dataset and problem. Before going any further, it would be nice to validate that we can move forward with those data and model ⬆️ ?
Usable ressources if we validate that we work on this dataset and problem :
Zama Bounty Program: Credit Scoring
Please give us as much information as possible on the bounty you would like to submit. You can find inspiration from our existing list of bounties here.
Credit Scoring
Bounty type:
major_bounty
Category:
Application
Overview: We propose to showcase in a real world application on credit scoring how Zama's technology can help address the privacy issues related to exposing sensitive personal information. We propose :
Library targeted:
Concrete-ML
Reward: 13500$ if planned as described by macro sizing
Description
Credit Scoring
A credit score is a numerical expression based on a level analysis of a person's credit files, to represent the creditworthiness of an individual. A credit score is primarily based on a credit report, information typically sourced from credit bureaus. — Wikipedia
Introductory Brief
Credit scoring has always traditionally been reserved to banking institutions and their likes, to assess their customers likelihood to repay their credit — or to decide whether to grant a credit to a potential customer.
Users are lacking a way to assess themselves their credit score, as doing so would require them to submit their private, sensitive credit data to a third party service. This concern over the user data privacy opens a great use-case for FHE : it allows a machine-learning model to be built and to assess user credit scores, without compromising the user banking data, nor their actual credit score.
As such, this projects aims to provide a concrete, useable application that assess users credit scores, while respecting their data privacy.
Application goals
This application has 2 main goals: provide a hands-on approach to Zama FHE and showcase the working of their encryption in a more user-friendly way
Provide a hands-on experience to users
FHE is a difficult concept to grasp. Non-technical users fail to understand how it works — or how it can work, and more technical ones doubt there can be an actual working implementation beyond just a technical proof. This credit-score app is a pretext to deliver an interactive experience over FHE. As such the focus will be put on showcasing a FHE encryption rather than building a full-fledged user credit-score app (e.g. no business-model, no “premium” features…). Nonetheless the model, results and overall behaviour should be immersive enough so the users can understand that FHE is no longer a theoretical concept: FHE is ready to reshape our concept of data privacy.
Encryption showcase
The main goal of FHE applied to machine-learning is to enforce the user data privacy. Ingenuous users will probably miss the difference between FHE and HTTPS, or fail to grasp how the data privacy can be preserved server-side. This application should help non-technical users to understand the preservation of the privacy of their data.
Our take is an ingenuous user needs 2 things in order to accept a change of paradigm (in our case a new form of data privacy behaviour): a representation he can grasp, and the existence of a proof
Visual representation of the encrypted data
A first step is to show the user a visual representation of its plain submitted data, how it presents a risk for its privacy (i.e. banking/judicial data), and how readable it is. Then show him how the same data is unreadable when encrypted so the user can visualize for himself that his data is protected.
Proof that encrypted data is unreadable by the server
Stating that the data cannot be decrypted by the server is insufficient to build trust. After helping the user visualise its data is “unreadable” the next step is to provide a technical proof in a user-friendly format (e.g. Zama documentation), which can be one of the delivery items.
Missions
The tasks are grouped in 3 categories:
Several deliverables will be produced:
The tasks will be converted into Github issues and the deliverables will be converted into Github milestones to help tracking the project development progress.
Maching Learning
Setup the ML project
Setup a machine-learning project that allows to operate credit-scoring on behalf of users. The goal here is not to develop a performant model from scratch but rather to find inspiration on existing models and quickly bootstrap a working model that is compliant with FHE model specific needs.
An emphasis will be put on ensuring the selected Kaggle model works and scales well with its Concrete-ML counterpart
Build the Concrete-ML equivalent
Performance Benchmarking
Analyse the performance between a base ML Model (e.g. scikit-learn implementation) and the built Concrete-ML counterparts. Analyse performance in terms of train/compilation time as well as prediction. The goal here is to show the difference in performance (which we foresee to be very large) but also to emphasise that this “drop” in perf is not so much of a concern at the user level as the execution time remains acceptable.
Web Application
Core application
Build a web app that allows users to submit their banking data over a simple form and display a credit score result. The application should provide the following pages:
The emphasis will be put on having a quickly working example, rather than spending time on complex UX/UI. The app structure will allow to later add other functionalities.
The data will be mocked (interfacing the web app with the model will be done in a later stage, once the model is deployed).
Visual representation of encryption
Add an intermediary step in the submission form, demonstrating to the user his data is encrypted before being sent to the server. This should be done by replacing the form “send” button to an “Encrypt” button, which redirects him to a page which demonstrates the data is encrypted. As stated in the Application Goals section, building trust at this stage is limited to showing the encrypted data. The interface will also provide links to Zama most adapted “proof” content.
The page will also provide a “Send to server” button to resume the flow.
Encryption with TFHE-rs
Replace the mock implementation of the encryption on the encryption page with the Wasm THFE-rs implementation. Depending on the encrypted data displayability (Binary?), the encrypted data visualizer might have to be adjusted (scroll, only first bits, etc…)
Interfacing with the server
Once the server has been deployed, interface its API so that the web client effectively sends the user encrypted data and obtains results.
Build a mirror page of the previous client encryption, display the encrypted response data received from the server, and provide a “decrypt” button in the client interface. Once the data is decrypted redirect the user to the summary page which displays its credit score.
Deployment
Setup the production deployment
Follow the production deployment, as described in the documentation. Depending on the practicability, we foresee the following:
client.zip
,server.zip
andserialized_processing.json
),These steps require some more info and will be split into more specific tasks once the complete workflow is determined.
Co-written with @robinstraub