nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Create a Pipeline to Utilize CellMinerCDB Pharmacogenomics Data with the Graph-Based Neural Network for Drug Response Prediction #195

Closed cannin closed 2 years ago

cannin commented 2 years ago

Background

Drug response prediction is a challenging machine learning problem. The use of neural networks for this prediction challenge is still at an early stage. A major challenge with any drug response prediction is that the prediction be "interpretable" for follow-up experiments. "Interpretation" here means an understanding, for example, of how some protein Protein A was modified and bound to Protein B ... etc to generate the given drug response (i.e., this is known as a mechanistic explanation).

Data

Projects such as CellMinerCDB (https://academic.oup.com/nar/article/49/D1/D1083/5983630) seek to collect pharmacogenomics data in cancer. Pharmacogenomics data is composed of outputs (how cancer cells respond to treatments) and potential predictors (characteristics of cells, such as protein levels). One public portion of the data is on BioConductor (https://bioconductor.org/packages/stats/bioc/rcellminer/).

Algorithm

An example of a graph-based neural network is the USCD Ideker Lab (http://drugcell.ucsd.edu/landing/) DrugCell model which tries to utilize a neural network architecture constrained to the Gene Ontology (http://geneontology.org/).

Goal

The goal with a be a pipeline that is able to transform data from rcellminer/CellMinerCDB to be training input for a graph-based network, such as DrugCell. This may require modifications or additional code to both code bases. One clear example of a needed task is the generation of "Morgan Fingerprints" for drug compounds within CellMinerCDB (https://chem.libretexts.org/Courses/Intercollegiate_Courses/Cheminformatics_OLCC_(2019)/6%3A_Molecular_Similarity/6.4%3A_R_Assignment).

Getting Started

Eventually, you'll need to write a proposal (see details: https://nrnb.org/gsoc). Elements of this proposal should include:

Difficulty Level: Medium

Size and Length of Project

Size: 175 hours Length: 12 weeks

Skills

List skills/technologies that the student should be familiar with. Also tag the issue with these.

Essential skills: Python, R (basics) Nice to have skills: PyTorch

Public Repository

Potential Mentors

Augustin Luna

anamika-yadav99 commented 2 years ago

Hi! I'm Anamika. I'm 3rd year undergraduate student from New Delhi, India. I have recently started exploring ML+ bioinformatics. I have a strong foundation in python and strong inclination towards ML.I have experience of working bioinformatics project as well. I would like to work on this project. I have gone through the resources listed above. Could you please guide me with the contributions that I can start making to the project right away?

ahmedtarek26 commented 2 years ago

Hi @cannin, @khanspers I am Ahmed Tarek and I am a medical informatics 3rd-year undergraduate student. I have good experience using python for two years. I took a Genetics course at college and did a project using some ML libraries, Biopython, Py3Dmol, and nglview which you can find here. I am interested in machine learning, and deep learning so I learned them during the last two years from DataCamp and joined Neuromatch Academy as an interactive student in which we used Pytorch to predict fmri from short videos as we compared between different NN (Alex net, Resnet, vgg). I am working as a research assistant on a research paper in NLP and we are about to publish our work soon.

I think I am good with python and I feel easy going through different python libraries to do specific tasks and am passionate to work on a new kinds of datasets to start working on this project for GSOC 22. Thanks for your time

khanspers commented 2 years ago

NRNB has officially been accepted as a mentoring organization for GSoC 2022! Here are some useful links:

cannin commented 2 years ago

@anamika-yadav99 @ahmedtarek26 I have updated the description of the project a little. I have added a "Getting Started" section to help you.

inoue0426 commented 2 years ago

Hi, @cannin @khanspers I'm interested in this project and writing up 1st draft. Is this still open? If so, could you review my article?

cannin commented 2 years ago

@inoue0426 yes still open, and i'm willing to review your proposal

anamika-yadav99 commented 2 years ago

Hi @cannin I'm almost done with the 1st draft. Can I mail you my proposal as well?

cannin commented 2 years ago

@inoue0426 @anamika-yadav99 (and anyone else for this project) I'm willing to review proposals. Send a link to a Google Doc via email (see mentor info above).

khanspers commented 2 years ago

A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.

AlexanderPico commented 2 years ago

IMPORTANT REMINDER: GSoC 2022 is for new “beginners” to open source.

Applicants are expected to review eligibility requirements prior to applying. We can not accept applications from contributors with prior open source development experience. From the GSoC FAQ https://developers.google.com/open-source/gsoc/faq:

Can someone already participating in open source be a GSoC Contributor?

The goal of GSoC is to bring new contributors into open source organizations. GSoC can also help beginner contributors learn the ins and outs of open source while being mentored by experienced community members. GSoC is for new and beginner contributors to open source, it is not for experienced contributors to open source.

khanspers commented 2 years ago

Closing because this is an active project for GSoC 2022.