nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
118 stars 40 forks source link

Developing an Optimization Library for libRoadRunner #106

Closed hsauro closed 4 years ago

hsauro commented 6 years ago

Background

LibRoadrunner is a high performance SBML based simulator that uses LLVM to generate very efficient runtime code. This enables libRoadrunner to simulate models on par with compiled C/C++ code. By combining libroadrunner with standard optimization algorithms it is possible to use libroadrunner to fit models to data. At present this is done by writing code to link the standard Python optimizers available via scipy with libroadrunner. Although this works, it is inefficient and for large models it is not practical. In this project we would like to develop a C/C++ based optimization library that can be used directly by libroadrunner without having to go via Python. This would enable us to provide high performance optimization capabilities.

Goal

To develop a reusable optimization library that can be integrated with libroadrunner. It is expected that off the shelf optimizers will be used, i.e. writing specific optimizers is not a requirement. However the key innovations will be 1) To develop a plugin system that will allow new optimizer algorithms to be added to the library; 2) A clean and user friendly API to make integration between libroadrunner and the optimization library easy. It will also be required to create Python bindings for the integrated system using SWIG (which we currently use).

Difficulty Level 2

This project requires some understanding of systems biology and modeling. However the project offers the opportunity to learn new skills in the area of numerical optimization as related to parameter fitting in biological network models.

Skills

C/C++ (essential) Python (some) A basic understanding of linear regression

Public Repository

https://github.com/sys-bio/roadrunner

Potential Mentors

Herbert M Sauro

Contact

Herbert M Sauro

nitinprakash96 commented 6 years ago

Hi @hsauro I'm interested in this project and would like to discuss more about it.

A little about myself: I'm a pre-final year undergraduate student majoring in Computer Science and Engineering. My fields of interest are Software development and Machine Learning. Currently I'm working on Intrusion detection as my B.Tech thesis.

My Skill set:

hsauro commented 6 years ago

Currently, we run parameter optimization by combining libroadrunner and the scipy library in python. However, this won't work if libroadrunner is run independently of python and it's also not the fastest way to run an optimizer. Some of our individual optimization runs take 2 hours to complete. If we wanted to do a parameter confidence estimation on top of that this would take another 10,000 optimization runs or 3 years to complete which is, of course, impractical. We are therefore looking for a someone who can write/reuse C/C++ optimization libraries that can be used directly with libroadrunner to reduce latency. This would require a driver application that can be used to connect the optimizer to libroadrunner and take control over the combined software. The driving application itself would need to be a reusable library with its own API, eg so that it could be fired up from Python. The driving library would also need to provide the objective function, this could be a built-in catalog of standard objectives (or for advanced users an on the fly compiled objective function, this would be a stretch goal however).

Ultimately the code would be run on a cluster of nodes to distribute some of the workload during optimization. We currently have a set of optimization libraries but others would be written depending on the student's interests. We have optimization working via scipy which could be used as a model for moving the same approach to C/C++ .

Herbert Sauro

On Mon, Feb 19, 2018 at 11:37 PM, Nitin Prakash notifications@github.com wrote:

Hi @hsauro https://github.com/hsauro I'm interested in this project and would like to discuss more about it.

A little about myself: I'm a pre-final year undergraduate student majoring in Computer Science and Engineering. My fields of interest are Software development and Machine Learning. Currently I'm working on Intrusion detection as my B.Tech thesis.

My Skill set:

  • Languages: C++, Python
  • Frameworks: Django, Flask
  • Platforms: Linux

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-366890551, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDi9vxLGtN2Zzr1IyA-KFqCv0EGm7ks5tWnZHgaJpZM4RCzex .

swkang73 commented 6 years ago

Hi my name is Sunwoo Kang, and I'm studying biomedical computation at Stanford University. Is the team still available to join for GSoC this year? I'm proficient in python, java, c++, and c to the level of writing complex programs, and would love to get more experience in simulation.

hsauro commented 6 years ago

Yes but you've only got two days to submit a proposal.

Herbert Sauro

On Sun, Mar 25, 2018 at 1:03 PM, Sunwoo Kang notifications@github.com wrote:

Hi my name is Sunwoo Kang, and I'm studying biomedical computation at Stanford University. Is the team still available to join for GSoC this year? I'm proficient in python, java, c++, and c to the level of writing complex programs, and would love to get more experience in simulation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-375998996, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDkWPkbw_L8SkWlbyiKAjvOP8SDMVks5th_gngaJpZM4RCzex .

debashish05 commented 5 years ago

Hello. My name is Debashish Roy and I am studying Information Technology at Jabalpur Engineering College. Is the project is available for GSoC? I know I am late due to some unavoidable condition but I am ready to give my best. Also I tried to build libRoadRunner from the source http://libroadrunner.org/build-roarunner-from-source/linux-build-distribution/ . But it seems not working. Please guide me.

hsauro commented 5 years ago

What error messages are you getting when you try to build libroadrunner?

H

On Wed, Mar 20, 2019 at 10:29 AM Debashish Roy notifications@github.com wrote:

Hello. My name is Debashish Roy and I am studying Information Technology at Jabalpur Engineering College. Is the project is available for GSoC? I know I am late due to some unavoidable condition but I am ready to give my best. Also I tried to build libRoadRunner from the source http://libroadrunner.org/build-roarunner-from-source/linux-build-distribution/ . But it seems not working. Please guide me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-474942873, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDtgxD4IGgbh9tiljJeNdt5Y2a3r8ks5vYm6ugaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

debashish05 commented 5 years ago

Sir, It is giving this error at the build state (ccmake ../roadrunner/third_party/). CMake Error: The source directory "/home/roadrunner/third_party" does not exist. Specify --help for usage, or press the help button on the CMake GUI. And I have installed all dependency but there is no third party folder.

hsauro commented 5 years ago

Have you been following the build instructions exactly?

I am ccing this to Kyle who will tell you where the detailed instructions are if you haven't found them.

Herbert Sauro

On Wed, Mar 20, 2019 at 10:44 AM Debashish Roy notifications@github.com wrote:

Sir, It is giving this error at the build state (ccmake ../roadrunner/third_party/). CMake Error: The source directory "/home/roadrunner/third_party" does not exist. Specify --help for usage, or press the help button on the CMake GUI.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-474952748, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDiCrW5w0YeT0qaANsInfZqICRyBCks5vYnNlgaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

0u812 commented 5 years ago

The build instructions are here: https://github.com/sys-bio/roadrunner/wiki/Building-from-Source

debashish05 commented 5 years ago

Yes I am following the instructions exactly. Ok sir I willl follow these instructions and let you know.

debashish05 commented 5 years ago

Sir, I tried to install libroadrunner from given source but at my system after 86% of the building of llvm it gives: fatal error : ld terminated with signal 9 [killed] compilation terminated. I tried to build llvm 8 as the available link works for windows. I tried the same building process two times and failed both the time. Also, it takes a lot of time around 4 hours just for llvm. Then I came out around this issue https://github.com/sys-bio/roadrunner/issues/334 Here I just runned the script. But it failed while building the third party dependency. The exact error is

Looking for BZ2_bzCompressInit in LIBBZ_LIBRARY-NOTFOUND CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: LIBBZ_LIBRARY linked by target "cmTC_30dea" in directory /home/debashish/tmp/roadrunner_build_thirdparty/CMakeFiles/CMakeTmp CMake Error at /usr/share/cmake-3.10/Modules/CheckLibraryExists.cmake:54 (try_compile): Failed to configure test project build system. Call Stack (most recent call first): third_party/libSBML-5.17.2-Source/CMakeLists.txt:644 (check_library_exists) -- Configuring incomplete, errors occurred!

0u812 commented 5 years ago

You need to build llvm 3.5.2 per the instructions above

hsauro commented 5 years ago

As mentioned by Kyle, the build process is fragile and therefore you must follow the instructions exactly as given, for example, it says:

"Right now, only an edited version of LLVM 3.5.2 is supported "

Hence you must use LLVM 3.5.2. We currently don't support LLVM 8. It also gives the download link to the 3.5.2 we use.

Build instructions:

https://github.com/sys-bio/roadrunner/wiki/Building-from-Source

Herbert Sauro

On Thu, Mar 21, 2019 at 8:40 AM Kyle Medley notifications@github.com wrote:

You need to build llvm 3.5.2 per the instructions above

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-475281186, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDvacfUo_2ukzJ8o8bKeoAo-T_GkSks5vY6dQgaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

debashish05 commented 5 years ago

Sir can you please tell me about this error under build road runner third party deps

Looking for BZ2_bzCompressInit in LIBBZ_LIBRARY-NOTFOUND CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: LIBBZ_LIBRARY linked by target "cmTC_2b64e" in directory /home/debashish/tmp/roadrunner_build_thirdparty/CMakeFiles/CMakeTmp

CMake Error at /usr/share/cmake-3.10/Modules/CheckLibraryExists.cmake:54 (try_compile): Failed to configure test project build system. Call Stack (most recent call first): third_party/libSBML-5.17.2-Source/CMakeLists.txt:644 (check_library_exists) -- Configuring incomplete, errors occurred! See also "/home/debashish/tmp/roadrunner_build_thirdparty/CMakeFiles/CMakeOutput.log".

It is due to llvm? The time process for building the llvm is very large is there is an alternate way.

hsauro commented 5 years ago

I don't think there is a shortcut unless Kyle has other ideas. I would start again and make sure you carry out each step exactly as described. If you then come to a problem I would get back to us because that suggests the instructions are incomplete.

Herbert Sauro

On Thu, Mar 21, 2019 at 8:53 AM Debashish Roy notifications@github.com wrote:

Sir can you please tell me about this error under build road runner third party deps

Looking for BZ2_bzCompressInit in LIBBZ_LIBRARY-NOTFOUND CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: LIBBZ_LIBRARY linked by target "cmTC_2b64e" in directory /home/debashish/tmp/roadrunner_build_thirdparty/CMakeFiles/CMakeTmp

CMake Error at /usr/share/cmake-3.10/Modules/CheckLibraryExists.cmake:54 (try_compile): Failed to configure test project build system. Call Stack (most recent call first): third_party/libSBML-5.17.2-Source/CMakeLists.txt:644 (check_library_exists) -- Configuring incomplete, errors occurred! See also "/home/debashish/tmp/roadrunner_build_thirdparty/CMakeFiles/CMakeOutput.log".

It is due to llvm? The time process for building the llvm is very large is there is an alternate way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-475288028, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDp_Fx4n_0tkQKfJyHsaECeWWCOBbks5vY6q_gaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

debashish05 commented 5 years ago

Can I carry on my work with proposal later I will build it? Although I find this script very useful https://github.com/sys-bio/roadrunner/issues/334#issuecomment-223550814 And Sorry sir I am taking a lot of your time. Actually, I am new in open source so struggling to figure out things. Thanks a lot for your support.

0u812 commented 5 years ago

Debashish I think you’re getting the bzip2 error because you’re using Matthias’ build script instead of the instructions above.

hsauro commented 5 years ago

We probably do need a script like the one you found, the entire process would be less fragile. Maybe this is something that could be created as part of the Google summer of code project?

Herbert Sauro

On Thu, Mar 21, 2019 at 9:13 AM Debashish Roy notifications@github.com wrote:

Can I carry on my work with proposal later I will build it? Although I find this script very useful sys-bio/roadrunner#334 (comment) https://github.com/sys-bio/roadrunner/issues/334#issuecomment-223550814

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-475296710, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDpiVtaVkkUY5ik8Dm6Km_TCSl8h-ks5vY69tgaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

debashish05 commented 5 years ago

I will again follow the same instruction one more time with the resources given by you and with llvn 3.5.2 not llvm 8. And get back to you. Thank You.

hsauro commented 5 years ago

ok good, we'll see how far you get. But long term a script would be very useful to have.

Herbert Sauro

On Thu, Mar 21, 2019 at 9:19 AM Debashish Roy notifications@github.com wrote:

I will again follow the same instruction one more time with the resources given by you and with llvn 3.5.2 not llvm 8. And get back to you. Thank You.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-475299514, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDnLouovQ7ZLqR9irQ7GRGnqfLnIwks5vY7EFgaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

matthiaskoenig commented 5 years ago

Just to make some general comments about the implementation. Key for a fast implementation (almost for all algorithms) is a fast calculation of sensitivities.

You should have a look at the following https://github.com/ICB-DCM/AMICI which will provide you the the necessary access to sensitivities in the fastest and most robust way possible

AMICI provides a multilanguage (Python, C++, Matlab) interface for the SUNDIALS solvers CVODES (for ordinary differential equations) and IDAS (for algebraic differential equations). AMICI allows the user to read differential equation models specified as SBML and automatically compiles such models as .mex simulation files, C++ executables or python modules. In contrast to the SUNDIALSTB interface, all necessary functions are transformed into native C++ code, which allows for a significantly faster simulation. Beyond forward integration, the compiled simulation file also allows for forward sensitivity analysis, steady state sensitivity analysis and adjoint sensitivity analysis for likelihood based output functions. The interface was designed to provide routines for efficient gradient computation in parameter estimation of biochemical reaction models but is also applicable to a wider range of differential equation constrained optimization problems.

Please have a look at https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005331 Using adjoint sensitivity analysis is the right way to go with parameter fitting

Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models.

hsauro commented 5 years ago

Matthias, note that the best optimization algorithms don't require sensitivities, especially for large models. Eg most evolutionary don't. On the other hand, I wasn't aware of this project. A recent project from Los Alamos National Labs (not yet published but submitted) saw major improvements in compute time when using libroadrunner for parameter optimization. I think they just coupled libroadrunner to optimizers via python glue. They didn't use gradient-based optimizers. A direct coupling would provide additional speedups.

Herbert

On Thu, Mar 21, 2019 at 1:36 PM Matthias König notifications@github.com wrote:

Just to make some general comments about the implementation. Key for a fast implementation (almost for all algorithms) is a fast calculation of sensitivities.

You should have a look at the following https://github.com/ICB-DCM/AMICI which will provide you the the necessary access to sensitivities in the fastest and most robust way possible

AMICI provides a multilanguage (Python, C++, Matlab) interface for the SUNDIALS solvers CVODES (for ordinary differential equations) and IDAS (for algebraic differential equations). AMICI allows the user to read differential equation models specified as SBML and automatically compiles such models as .mex simulation files, C++ executables or python modules. In contrast to the SUNDIALSTB interface, all necessary functions are transformed into native C++ code, which allows for a significantly faster simulation. Beyond forward integration, the compiled simulation file also allows for forward sensitivity analysis, steady state sensitivity analysis and adjoint sensitivity analysis for likelihood based output functions. The interface was designed to provide routines for efficient gradient computation in parameter estimation of biochemical reaction models but is also applicable to a wider range of differential equation constrained optimization problems.

Please have a look at

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005331 Using adjoint sensitivity analysis is the right way to go with parameter fitting

Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/106#issuecomment-475391877, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDvJszP8-PL_P1HRJuUYigGn7iU_tks5vY-z5gaJpZM4RCzex .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

khanspers commented 5 years ago

Active GSoC 2019 project.

hsauro commented 4 years ago

This issue is now reopened at the request of debashish05

khanspers commented 4 years ago

Active project for GSoC 2020, closing here.