zama-ai / bounty-program

Zama Bounty Program: Contribute to the FHE space and Zama's open source libraries and get rewarded 💰
https://zama.ai
237 stars 12 forks source link

Create an encrypted DNA ancestry using Concrete ML #95

Closed zaccherinij closed 4 months ago

zaccherinij commented 7 months ago

Concrete ML simplifies the use of FHE for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. FHE can be particularly useful to protect users health care data, and is a perfect candidate to solve the privacy risks with using genealogy analysis websites.

Over 30 million people have taken DNA tests to determine their ancestry through computer genetic genealogy. By processing the digitized sequences of DNA bases, sophisticated computer algorithms can identify if one’s ancestors came from a number of ethnic groups. DNA is sensitive personal identification as it can identify an individual uniquely and leaks of DNA data have already happened.

DNA ancestry identification is a complex process that involves multiple steps. First, DNA phasing assigns alleles (the As, Cs, Ts and Gs in DNA strands) to the paternal and maternal chromosomes. Second, ancestry can be determined by referencing specific segments of the DNA with large databases of DNA of known ancestry. An alternative is to use machine learning to classify each such segment and, finally, to aggregate the ancestry of each individual segment into a final classification.

Using Fully Homomorphic Encryption we think determining ancestry can be done on encrypted DNA sequences, preserving the security of users’ DNA. Most published machine-learning based methods for ancestry identification typically perform local ancestry inference. Global ancestry inference tries to compute the genome-wide average of the population contributions while local ancestry inference (LAI) tries to identify the regional ancestry of a genomic segment, which is more amenable to machine learning. To build the global ancestry from local decisions, LAI algorithms use machine learning also in a second step, taking ancestry classifications of different segments and fusing them into a single classification for a person.

Many types of machine learning models were proposed for local ancestry inference: neural networks [1], hidden markov models [2], decision trees or logistic regression [3] (the G-nomix project). A great hands-on resource on machine learning for ancestry is the AI Sandbox github.

Submission

1️⃣ Want to solve this bounty? Register here. 2️⃣ Ready to submit your solution? Submit here. 🗓️ Submission deadline: May 12th, 2024.

Overview

The goal of this bounty is to train ancestry classifiers using Concrete ML so they can execute on encrypted data. You can assume the input DNA is phased and in the proper format. As mentioned above, most approaches are two-stage. First, classifiers are trained for individual genomics windows. Second, a smoother is trained which combines the predictions of the individual classifiers.

You can use any datasets that you want as long as you abide by their license agreements. Some examples are the 1000 Genomes Project, the Simons Genome Diversity Project and the Human Genome Diversity Project.

What we expect

[!IMPORTANT] To qualify for the maximum prize, the FHE application should perform both stages of the classification in FHE. Partial prizes will be awarded if only one stage of the pipeline is in FHE, but you can assume preprocessing such as phasing is done in the clear in a separate step (you can use phased DNA directly).

Implementation guide

Reward

🥇Best submission: up to €5,000.

To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.

🥈Second-best submission: up to €3,000.

For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.

🥉Third-best submission: up to €2,000.

The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.

Reward amounts are decided based on code quality, model accuracy scores and speed performance on a m6i.metal AWS server. When multiple solutions of comparable scope are submitted they are compared based on the accuracy metrics and computation times.

Related links and references

[1] Benet Oriol Sabat, Daniel Mas Montserrat, Xavier Giro-i-Nieto, Alexander G Ioannidis, SALAI-Net: species-agnostic local ancestry inference network, Bioinformatics, Volume 38, Issue Supplement_2, September 2022, Pages ii27–ii33,

[2] Wei Y, Zhi D, Zhang S. Fast and accurate local ancestry inference with Recomb-Mix. bioRxiv [Preprint]. 2023 Nov 19:2023.11.17.567650. doi: 10.1101/2023.11.17.567650. PMID: 38014185; PMCID: PMC10680832.

[3] Helgi Hilmarsson, Arvind S. Kumar, Richa Rastogi, Carlos D. Bustamante, Daniel Mas Montserrat, Alexander G. Ioannidis, High Resolution Ancestry Deconvolution for Next Generation Genomic Data, bioRxiv 2021.09.19.460980

Submission

1️⃣ Want to solve this bounty? Register here. 2️⃣ Ready to submit your solution? Submit here. 🗓️ Submission deadline: May 12th, 2024.

Questions?

Do you have a specific question about this bounty? Join the live conversation on the FHE.org discord server here. You can also send us an email at: bounty@zama.ai

zaccherinij commented 4 months ago

A friendly reminder that the Submission deadline is May 12th, 2024 at 23:59 AoE (Anywhere on Earth). Good luck!

alephzerox commented 4 months ago

I'm not sure how to submit my solution (the link above leads to a general page) but here it is:

https://github.com/alephzerox/ancestry-fhe

zaccherinij commented 4 months ago

Hi @alephzerox,

Please head to the page: https://www.zama.ai/bounty-and-grant-program and use the form under "submit to the bounty program" Cheers

On Sun, May 12, 2024 at 6:52 PM alephzerox @.***> wrote:

I'm not sure how to submit my solution (the link above leads to a general page) but here it is:

https://github.com/alephzerox/ancestry-fhe

— Reply to this email directly, view it on GitHub https://github.com/zama-ai/bounty-and-grant-program/issues/95#issuecomment-2106313245, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABL53W6WL5Q7UCYZTYFM543ZB6M6LAVCNFSM6AAAAABDBRCZ6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGMYTGMRUGU . You are receiving this because you modified the open/close state.Message ID: @.***>

zaccherinij commented 4 months ago

Thank you to everyone who submitted to the Zama Bounty Program Season 5. Our team will review all submissions and give some initial feedbacks in the coming days! Cheers.