nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
116 stars 39 forks source link

Convert CellBox TensorFlow Code to PyTorch #180

Closed cannin closed 1 year ago

cannin commented 3 years ago

Background

CellBox (https://www.cell.com/cell-systems/pdfExtended/S2405-4712(20)30464-6) is a machine learning algorithm (built using TensorFlow) for generating network models to predict for cellular responses to large scale perturbation experiments. The output is executable models capable of predicting the effects of cancer therapies. CellBox includes explicit models of cell dynamics in a machine-learning framework, and therefore can be both accurate and interpretable.

Goal

The current CellBox models are limited by the static graph feature of TensorFlow v1. To further increase the model scope, the Sander lab is working on the transition of CellBox algorithm from TensorFlow to PyTorch, a different machine learning framework that uses dynamic graph to optimize. A dynamic computational graph will allow flexible manipulation of the network structure and the introduction of stochasticity into differential equations during model training. The goal of this summer project is to help in that effort. A student would need to be able to construct CellBox in PyTorch and then reproduce the training result of the original model (in TensorFlow), but also to develop new features of the optimization tool.

Getting Started

  1. Download, install, run the existing code; ask questions if you have them
  2. Identify problem areas that might be difficult to convert

Difficulty Level 2

The student will need to learn how to interpret CellBox algorithm and translate the TensorFlow “language” to the PyTorch “language”.

Skills

Public Repository

Potential Mentors

Augustin Luna Bo Yuan

palashgarg0109 commented 3 years ago

Hi @DesmondYuan I found this issue interesting and would love to work on it. I have done some deep learning projects using TensorFlow framework and am keen to learn PyTorch. Could you please help me getting started with the project. Thanking you !

Ishan2601 commented 3 years ago

Hi! @cannin and @DesmondYuan

I am interested in working on this issue. I have used the TensorFlow framework for my projects and have also explored the PyTorch framework. I would love to make valuable contributions to this projects while enhancing my knowledge of PyTorch framework. After the going through the required resources, how should I proceed with the task?

Also, what will be the preferred medium of communication?

Thank You.

cannin commented 3 years ago

@Ishan2601 for the time being here on GitHub is good. I added a "Getting Started" section.

Debanitrkl commented 3 years ago

@cannin and @DesmondYuan I have gone through the documentation and I have also experience working with pytorch and tensorflow previously and have also refactored neural networks code from tensorflow to pytorch. I think I would be able to contribute to this project , could you help me setting this up and getting started with the issue. Thank You

DesmondYuan commented 3 years ago

@palashgarg0109 @Ishan2601 @Debanitrkl First of all, thank you for your interest in this project! As @cannin mentioned, please refer to the "Getting started" section to kick off. Once you have a preliminary test run with the existing TF code, you might be able to have a better understanding of what the summer project would look like!

Debanitrkl commented 3 years ago

Capture

I'm getting this error while setting up the Cellbox, could anyone help @DesmondYuan @cannin @judyueshen

cannin commented 3 years ago

@Debanitrkl thanks for digging into CellBox. If you find issues with CellBox, please post them directly on the CellBox repo: https://github.com/sanderlab/CellBox/issues so we can stay organized. Please repost your question there.

Feel free to post questions more focused on how to proceed with this project here.

palashgarg0109 commented 3 years ago

@cannin @DesmondYuan I have successfully install and run the existing code on my PC. I am having some difficulty in understanding the data files expr_index.txt and expr.csv, what does each row and column represent in both the files? And also I think there is an error in ReadMe file at One Click Model Construction Step2: https://github.com/sanderlab/CellBox#step-2-use-mainpy-to-construct-models-using-random-partition-of-dataset there is no file name Example.random_partition.cfg.json in configs folder it should be Example.random_partition.json instead.

DesmondYuan commented 3 years ago

@palashgarg0109 Thanks for looking into the codes and thanks for the typo report. They are fixed now.

Regarding the data preparation, please refer to https://github.com/sanderlab/CellBox#data-files-in-data-folder for more information. Feel free to let us know if you need any additional clarification!

elamdf commented 3 years ago

Hi @cannin and @DesmondYuan

I am interested in this project, and have experience with PyTorch. Is there anything I need to sign up for or do I just make a PR once done?

Elam

DesmondYuan commented 3 years ago

@T3chy Thanks. And yes a PR would work fine. You might want to load the tensorflow code first and do some quick test runs. Note that some of the models in model.py, e.g. CoExp class, are currently deprecated, so try to focus on CellBox class and the abstract class PertBio only to avoid distraction.

palashgarg0109 commented 3 years ago

@DesmondYuan @cannin I have understood the code written using tensorflow framework to quite an extent and now I am very excited to work further on the project. Could you please suggest What I have to do next? Also I am keen to know what are the judging parameters the mentors keep into account at the time of selecting a student for a project knowing that there is competition in this project already.

cannin commented 3 years ago

@palashgarg0109 see https://nrnb.org/gsoc.html for how to apply; your proposal needs to reflect the goals (above) of the project

vz415 commented 3 years ago

Hi all, quick question out of curiousity. Why switch to pytorch and not jax? If I understood the CellBox paper correctly, automatic differentiation is the key ingredient and jax seems to have more functionality built around autodiff for a variety of settings. This may be a dumb question but I was curious enough to ask. Thanks!

DesmondYuan commented 3 years ago

@vz415 Great question. Ideally JAX and Zygote in Julia should indeed work better with such tasks! We thought torch and TF are more widely used so they might be easier for a summer project. But if you are interested in JAX, please definitely feel free to mention it in the proposal. We would love to see it.

khanspers commented 2 years ago

Cleanup in preparation for GSoC 2022.

ahmedtarek26 commented 1 year ago

Hi @cannin , I am interested in this project, I have experience with PyTorch and TensorFlow, and I did many machine learning projects.

Could you help me get started with the project, I hope to know if there are any updates from the last year?

Thanks in advance!

arijitde92 commented 1 year ago

Hi @cannin @DesmondYuan , I have experience in both Tensorflow and Pytorch and have published some research works using both of them (This work is done in tensorflow v2, while this work is done in PyTorch.

I believe I can contribute to this project and interested on learning more about PyTorch and Tensorflow.

I have downloaded the existing code. Will run and see if there are any issues, will let you know here.

Hoping to have a long term collaboration starting with GSOC'23.

Thanks

khanspers commented 1 year ago

NRNB has been accepted as a mentoring organization for GSoC 2023! Contributor applications open on March 20. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

The-0mnipotent commented 1 year ago

Hi @cannin @DesmondYuan , I would like to contribute to the project. I am working on the present code. Hope to collaborate in GSoC'23.

Aditya-vardhan13 commented 1 year ago

Hello @cannin @DesmondYuan , I am Aditya Vardhan from IIT Bhilai,India. I am proficient with python, tensorflow and PyTorch and worked on many projects in the same. I find this project interesting, can you please help me get started . Going through the code right now, looking forward to contribute

Mustardburger commented 1 year ago

Hello @cannin and @DesmondYuan ,

Thank you for setting up this project for us! I am Phuc Nguyen, a 3rd year undergraduate student at the University of Cincinnati majoring in Biomedical Engineering and Computer Science, and I have extensive experiences in Tensorflow, Pytorch, and relevant research in deep learning for life sciences. I will write an additional email to you to send my resume. I am best reached via email (nguye6tp@mail.uc.edu), and via comments in this issue. I am very excited about this project!

To get started, I would like to have some questions. Others may have similar questions as I do, so please add to the list if you have any:

I am working on my proposal right now, may I send its first draft to you by early next week so I can have some feedback from you? And is it better for me to post the questions here or email you?

Thank you! Phuc

cannin commented 1 year ago

@Mustardburger answers to your questions:

Anirbanbhk88 commented 1 year ago

Hi @cannin @DesmondYuan @judyueshen I am a Masters student studying AI in University of Hamburg, Germany. I have knowledge in topics like statistical ML, NLP, computer vision. I have worked in multiple projects in Python, Pytorch, Keras. Apart from datascience stack I also have experience working in Java, Php, Swift. I came across this topic and got intereted on work on it. I am a bit late to apply but I am interested to contribute and gain experience from this project. Applying ML in health and biological problems has been an area of my curiosity. Could you please help me getting started with the project and if any call can be setup for a discussion

cannin commented 1 year ago

@Anirbanbhk88 there is still time to apply; start by focusing on the "Getting Started" section; email first if questions arise as you draft an application.

khanspers commented 1 year ago

This project is an active GSoC 2023 project. Closing this issue because it is no longer available for other contributors/students.