nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Develop a Cluster Computing Framework for Dynamical Modeling #77

Closed 0u812 closed 7 years ago

0u812 commented 7 years ago

Introduction

Data growth will be a major factor in the near future. However, most academic software in systems biology is not written with explosive growth in mind. This is unfortunate, as related fields have made great gains in scalability simply by leveraging the tools of big data, as evidenced by the great success of startups like H2O.ai.

Our group at the University of Washington is developing a Python-based framework for biological modeling. The core of this framework is a high-speed ODE/stochastic biochemical network simulator, Roadrunner, which pushes the limits of single-threaded computing. This summer, we would like to mentor a student in developing a cluster computing framework for running simulations more scalably.

Goal

The overall goal is to scale up common types of tasks in dynamical modeling. These tasks usually involve 1) loading a model (usu. SBML), 2) making some perturbation to the model (changing parameter values), 3) simulating the modified model, and 4) collecting some metrics from the results. In order to make this project tractable for a single summer, I suggest breaking it down into smaller tasks which can be used as milestones. For the initial phase of the project, we should ideally focus on feasibility and figuring out how to implement cluster computing in a uniform way. For example,

From here, the next step would be to construct a more general API that can handle the common analysis types in dynamical modeling. The common types of analyses that can be parallelized include parameter scans, parameter fitting, sensitivity analysis and parameter identifiability. If we can implement at least some of these during the summer, that would be great.

Skills Required

Familiarity with cluster computing, such as Spark or Hadoop (though Spark is preferred due to its lower overhead), would be ideal. Experience with Python and Linux would also be helpful. Above all, we want students who are self-driven, eager to learn, and excited about research. This is a highly unexplored application of cluster computing, and would likely lead to a peer-reviewed paper if successful.

Possible Mentors

Main Contact

References

Somogyi, E. T., Bouteiller, J. M., Glazier, J. A., König, M., Medley, J. K., Swat, M. H., & Sauro, H. M. (2015). libRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics, btv363.

Sauro, H. M., Choi, K., Medley, J. K., Cannistra, C., Konig, M., Smith, L., & Stocking, K. (2016). Tellurium: A Python Based Modeling and Reproducibility Platform for Systems Biology. bioRxiv, 054601.

0u812 commented 7 years ago

Added the Java label because most modern cluster frameworks are Java- or Scala-based, so knowing one of these languages beforehand would be helpful.

matthiaskoenig commented 7 years ago

+1 I second this proposal.

Just for clarification: In an implemented first version there will be no synchronization between the different distributed models/simulations, i.e. the simulation tasks are completely independent from each other? Also there is no dependence of simulations on each other, but every single simulation is an independent task.

108krohan commented 7 years ago

Hello Everyone,

My Masters degree (pursuing) in Biological Sciences should be of interest to a rapidly growing organisation like yours. My sound knowledge of Python, Java, C, C++, SQL matches the project description. Primary OS: Linux Ubuntu 16.04 LTS.

I'll be honest I'm new to high-performance computing. And you can expect nothing but eagerness for the research paper. You can expect S.O.L.I.D. programming principles followed rigorously because that would help the organisation in the long run.

I do have 3 questions in mind:

  1. Do I have to mail alex.pico [at] gladstone.ucsf.edu or the mentors, in order to get in touch?
  2. What steps should I take in order to be a strong candidate?
  3. Do I start with from NRNB GSoC Google Doc template?

Thank you for reading. Hoping for a fast and positive response.

0u812 commented 7 years ago

Hi Rohan,

Thanks for your interest. I will try to answer each of your questions:

  1. At this stage, you're basically getting to know the mentors and bouncing ideas off of us, so posting here is fine.
  2. I think having a solid proposal is the most important thing. You can use the Google Doc that you linked and start filling it out (use File -> Make a copy). Once you have the content basically filled in you can share it with us for feedback. Feel free to reach out to us, especially for the parts you may not be familiar with such as parameter sweeps and parameter fitting. Google has some guidelines for selecting students. In addition to those, I would also pay specific attention to:
    • Does the student's plan have enough detail and does it lead to a useful feature of the software (such as the ability to perform parameter sweeps and parameter fitting on a cluster)?
    • Is the proposed work realistic for GSoC?
    • Does the student have the skills necessary to carry out the proposed work?

I think having all of these things would lead to a high chance of the project being successful, which is good for both us and the student.

  1. That is correct. You can make a copy for your own editing (use File -> Make a copy).

Regards, Kyle

108krohan commented 7 years ago

Thanks for such a prompt response! As instructed, I've mailed a preliminary Document, awaiting suggestions.

Meanwhile, I've set up Tellurium and the tutorials from the Tellurium page are quite helpful. Could you please confirm if that's the right way to proceed?

This page has lots of relevant links, I just wanted to know which are the most important so I can dig more deeply for the project.

Thank you for taking the time to read and promptly reply :)

0u812 commented 7 years ago

Hi Rohan,

The tutorials you linked to should be helpful. You can also find more helpful tutorials at http://tellurium.readthedocs.io/en/stable/index.html, especially the Models & Model Building section. I can't provide feedback on the document you sent because the project proposal isn't filled it yet, but I assume you are trying to learn how to use tellurium first. Can you tell me how far along you are in the process? For example, if I gave you a description of a reaction network could you encode it and simulate it in tellurium?

108krohan commented 7 years ago

Thanks for the tutorial link (http://tellurium.readthedocs.io/en/stable/index.html)

Sorry, I've been busy with college tests (4 tomorrow). I'm trying to slip in an hour or two for Tellurium tutorials each day though. And I'll let you know when I'm through with encoding and simulation.

Regarding Reaction Network, does it entail Antimony usage?

You are busy, please don't trouble to reply if that's correct.

108krohan commented 7 years ago

Finished executing examples from documentation. Where should one ideally go from here?

Okay, while going through the documentation I noticed certain things:

  1. Bioservices needs to be installed separately.
  2. Tellurium build installer for Linux? Initial setup via conda mentioned here wasn't enough. Had trouble with SED-ML and Combine examples because they need pygraphviz, and sbml2matlab.
  3. te.plotArray() used where r.plot() produced same results. Any reasons for this?
matthiaskoenig commented 7 years ago

Hi Rohan, if you have any feedback on the tutorials please let me know. I will update these within the next few days. If you found any errors, unclear information or missing information please let me know so I can update the respective pages.

The best Matthias

On Sun, Feb 26, 2017 at 12:59 PM, Rohan Kumar notifications@github.com wrote:

I've completed executing every code from the documentation. So I kind of learnt Tellurium a little now. That's how far along I am right now.

Where should one go from here?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-282551048, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29ulzTK06IVtBP684Qvc5-PVI7yOrBks5rgWkUgaJpZM4LkDzB .

-- Matthias König Junior Group Leader LiSym - Systems Medicine of the Liver Humboldt-University Berlin, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

108krohan commented 7 years ago

Hi Matthias,

The tutorial is pretty accurate. But I'll try to go over the documentation again today and list out whichever errors, unclear information or missing information I find here.

One more thing, though pygraphviz (+sbml2matlab required for SED-ML and Combine) started working after some head-scratching, I wanted to confirm if more dependencies/libraries than just these conda installs is required. Because I needed to. (Example: pandas, bioservices) conda install -c sys-bio tellurium conda install jinja2 ipython conda install -c SBMLTeam python-libsbml More specifically, is there no way for enabling IDE plugins and SBOL functionality via conda-install method? Or are they optional?

Regards, Rohan

0u812 commented 7 years ago

Hi Rohan, how is the application coming? What questions do you have? Do you think you need more info on modeling/Tellurium/cluster computing?

108krohan commented 7 years ago

Hi Kyle, really sorry for the late response. I figured it would be better to learn Spark before posting here or updating the application (I'll share the updated doc latest by day after tomorrow morning, EST for feedback).

Our overall goal is to scale up model 1) loading 2) perturbation 3) simulation and 4) metric generation through HPC via Spark, yes? You've already done a fantastic job of breaking our project into tasks. What kind of subtasks are you expecting? Can you meanwhile suggest names of other materials you might want me familiarised with?

Regards, Rohan

0u812 commented 7 years ago

No worries 😄 I think you've got the right idea for scaling up. Now that you've finished the tellurium tutorials, I can give you more specific examples of the types of analysis we can parallelize. It might help to talk face-to-face. Are you free next week or during the weekend to Skype?

108krohan commented 7 years ago

Yes! Are you free between 7:30PM and 11:59PM Monday night EST? (Schedule EST/IST here)

matthiaskoenig commented 7 years ago

Let me know what time. If I am free I would like to join.

On Mar 4, 2017 10:38 AM, "Rohan Kumar" notifications@github.com wrote:

Yes! Are you free between 7:30PM and 11:59PM Monday night EST? (Schedule EST/IST here https://www.worldtimebuddy.com/?qm=1&lid=30,5,2643743&h=30&date=2017-3-7&sln=6-11 )

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-284140331, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29usDnilwuMVyMNbonu7GnpFGVIvvCks5riTEEgaJpZM4LkDzB .

ShaikAsifullah commented 7 years ago

Hi, I have been a little late. Can others join this meeting if it is not scheduled yet. Or if it is already done, may I get updates please. I am also planning to contribute to it.

0u812 commented 7 years ago

Hi all, for the meeting it looks like the best to for all three time zones (PST/IST/CET) is 8 am PST / 9:30 pm IST / 5 pm CET. Would it work to Skype Wednesday at that time for about an hour? If that doesn't work, I can set up a survey.

hsauro commented 7 years ago

8 pst is Ok with me.

Herbert On Mon, Mar 6, 2017 at 12:37 PM Kyle Medley notifications@github.com wrote:

Hi all, for the meeting it looks like the best to for all three time zones (PST/IST/CET) https://www.timeanddate.com/worldclock/meetingtime.html?iso=20170306&p1=234&p2=54&p3=37 is 8 am PST / 9:30 pm IST / 5 pm CET. Would it work to Skype Wednesday at that time for about an hour? If that doesn't work, I can set up a survey.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-284523303, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDsdtLjq8D7SwoU98gx9FDvix9uMEks5rjG2OgaJpZM4LkDzB .

108krohan commented 7 years ago

Hi Kyle, Yes! Awesome :D :+1: You'll receive an updated doc within the next 3-4 hours. Your feedback would be incredibly valuable. I noticed you had mentioned parameter sweeps and fitting in an earlier comment, and I've been trying to learn as much as I can. I'd like to be prepared when we Skype. Please tell me anything you'd like me to be completely thorough with.

Regards, Rohan

108krohan commented 7 years ago

Hi Matthias @matthiaskoenig,

Hope it helps!

Regards, Rohan

0u812 commented 7 years ago

It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium.

My Skype user id is jkylemedley. If @108krohan and @ShaikAsifullah could please send me a contact request on Skype that would be great.

hsauro commented 7 years ago

This is my contact name hsauro, if I attend I'll just observe.

Herbert

On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley notifications@github.com wrote:

It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium.

My Skype user id is jkylemedley. If @108krohan https://github.com/108krohan and @ShaikAsifullah https://github.com/ShaikAsifullah could please send me a contact request on Skype that would be great.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-284834813, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB .

hsauro commented 7 years ago

PS What's the procedure for attending the skype call, do you just call us all?

Herbert

On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley notifications@github.com wrote:

It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium.

My Skype user id is jkylemedley. If @108krohan https://github.com/108krohan and @ShaikAsifullah https://github.com/ShaikAsifullah could please send me a contact request on Skype that would be great.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-284834813, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB .

matthiaskoenig commented 7 years ago

My skype name is konigmatt

On Wed, Mar 8, 2017 at 5:10 AM, Herbert Sauro notifications@github.com wrote:

PS What's the procedure for attending the skype call, do you just call us all?

Herbert

On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley notifications@github.com wrote:

It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium.

My Skype user id is jkylemedley. If @108krohan https://github.com/108krohan and @ShaikAsifullah https://github.com/ShaikAsifullah could please send me a contact request on Skype that would be great.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77# issuecomment-284834813, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/77#issuecomment-284941528, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29ulAzetyhjTeBfwxmUpnAdIKLDMhvks5rjiowgaJpZM4LkDzB .

-- Matthias König Junior Group Leader LiSym - Systems Medicine of the Liver Humboldt-University Berlin, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

khanspers commented 7 years ago

GSoC 2017 selected project