Open Midnighter opened 4 years ago
Hi @Midnighter,
In general, I think this is a good idea and it might be easier to convince the community to extend a common core library than to convince everyone to adopt cobrapy.
Here are a few comments/concerns:
There is already one Model class that is "a faithful Python representation of an SBML model", that is the SBML model class from the libsbml API. Anything else we develop would be a not-so-faithful representation of the SBML model and I can imagine it will be hard to make everyone agree on what should be the core functionality and what can be discarded from the libsbml Model object.
After reading your document proposal, I disagree with the Serialization part (sorry, that was fast, 😄 ). I don't see the purpose of serializing models to JSON or YAML. In fact, I don't like the fact that some cobra projects (cobrapy, memote, Bigg database) have implemented the JSON format. It is not a community standard like SBML, it is a representation that was adopted in an ad hoc fashion by a subset of the community.
There is already one Model class that is "a faithful Python representation of an SBML model", that is the SBML model class from the libsbml API. Anything else we develop would be a not-so-faithful representation of the SBML model and I can imagine it will be hard to make everyone agree on what should be the core functionality and what can be discarded from the libsbml Model object.
While I agree that the libSBML Model class is, of course, the authoritative source, it's purpose is to provide functionality for exchanging information. It is not the right place to implement your 'business logic' and in fact, none of the modeling frameworks use it that way.
The purpose I had in mind for this class is to be a parent to a constraint-based model class, as a hypothetical example,
Parent
:arrow_right: Model
:arrow_right: ConstraintBasedModel
. I think it would be sub-optimal at best, actually bad design, to inherit directly from the libSBML Model. However, it should be faithful in the sense that one can round trip a defined subset of the information from an SBML document to a Model
and back without loss of information. This subset should fulfil constraint-based modeling needs and encompass current model encoding practices.
It will certainly be a struggle to agree on class designs but that is the question I'm posing here, is the effort worth the gains for the community?
After reading your document proposal, I disagree with the Serialization part (sorry, that was fast, smile ). I don't see the purpose of serializing models to JSON or YAML. In fact, I don't like the fact that some cobra projects (cobrapy, memote, Bigg database) have implemented the JSON format. It is not a community standard like SBML, it is a representation that was adopted in an ad hoc fashion by a subset of the community.
I firmly stand on the side of practicality here. JSON is the de facto standard for data exchange via HTTP and YAML has proven helpful for line-based diffs. Since it is less verbose than XML, JSON has also proven to be a more efficient format which is an important consideration for multi step workflows involving metabolic models. I agree that BiGG should probably only distribute models in SBML format as the source of truth but that is a different discussion.
Again, in my opinion the goal should be the possibility to round trip from SBML to JSON and back to SBML without loss of relevant information. This is certainly not the case with the current JSON format. By ironing out the current problems and getting more buy-in for the format from a larger community, we can make the format less ad hoc.
There are many excellent ideas here, and I am happy to see such eagerness in implementing this:
Appendix (quote from the specification of SBML Level 3 Version 2 Release 2, page 3):
SBML is defined neutrally with respect to programming languages and software encoding; however, it is oriented primarily towards allowing models to be encoded using XML, the eXtensible Markup Language (Bray et al., 2004). This document contains many examples of SBML models written in XML. Formal schemas describing the syntax of SBML, as well as other materials and software, are available from the SBML project web site, http://sbml.org/.
Hi all,
optlang
must be pat of the core (at least for testing that the correct LP problem with the correct solutions is generated).core module
instead of a separate core package
.In summary: I highly agree we need the core data objects to be as compatible as possible to SBML, but I think this can be done in the core module
. I don't think creating a core data structure package is a good idea, because we need the simulation to test the correct implementation of the core data structure (e.g. SBML test suite).
I think @Midnighter's suggestion for having a core metabolic modeling class as an independent package rather than an independent module inside cobrapy is an attempt to bring together all the python community developing metabolic modeling tools in python which for whatever reason prefer not to use cobrapy.
I compiled a list (already 2 years ago, so it might be incomplete) of constraint-based metabolic modeling packages in python:
If you are a cobrapy user it is easy to say "let's all just agree on using cobrapy", but after developing framed/reframed for the past 7 years, I am not easily willing to give up on it, and I suppose @bgoli also has no plans to give up on cbmpy, which he also has been developing for many years :)
Sharing a core modeling package that we all agree to write from scratch would be a nice way to bring the community together, but I am not overly optimistic that we will come to a consensus.
It might be a good chance to merge some of the ideas from #394 as well, specifically having a more hierarchical view of reactions, metabolites, and models that allow components to be re-used. But the thought that it would be one package to rule them all does feel a bit like the standards problem...
Yes, it makes sense if some of the other packages/tools would adopt such a common data structure (but I don't see this happening). SBML is the existing exchange standard for models. I don't think there will be a common standard for the internal data structures (and don't see any need for that). Many of the tools are 1 man shows which have not been updated in years (e.g. https://github.com/linsalrob/PyFBA). Is there any other actively developed tool with more then 1 developer in python which would be willing to adopt such a core data structure?
@matthiaskoenig some of the points I would have answered were already made by @cdanielmachado :slightly_smiling_face: Additionally, I think you highlight an important design choice with the following point:
Simulation is part of the checking of the core classes, i.e. to be able to test if the model representation is correct the SBML test suite for the FBC models should be run, which at least requires FBA as an implementation. So packages for running FBA and creating the LP problem like optlang must be pat of the core (at least for testing that the correct LP problem with the correct solutions is generated).
In how far is this really necessary and how much validation is already done by SBML itself for us? I certainly see a few use-cases for a package without any solver implementation:
But the thought that it would be one package to rule them all does feel a bit like the standards problem...
@pstjohn what I have in mind is rather a bottom-up and not a top-down approach. As I'm sure you would agree, the sustainability of research software is in serious doubt. And the incentives for implementing SBML well and thoroughly are basically non-existent. Most of us get rewarded for publishing new shiny methods, not for creating an implementation of the SBML specification down to the letter.
And I can understand that when we innovate, we want to do it in the tool that we're most familiar with or that enables us to best do what we need to do. But what I'm asking is if we can agree enough on these basics so that we can pool our limited time to create this core that we all benefit from. SBML is just such an effort itself. Can we build a sub community for SBML in Python? Or do we continue to work in isolation?
Maybe we could have a survey to see how many of the "one-man shows" would be willing to jump into such a common project.
Speaking for myself, I started (re)framed in 2012 because I was not happy with the messy structure of the cobrapy modeling classes (and multiple other design decisions), which in my opinion could only be solved by writing something from scratch.
I have discussed with the cobrapy folks once every few years about a possible reconciliation, which has never happened, one large reason being the insistence on keeping this legacy cobrapy model structure.
Therefore, the idea of having a brand new model structure sounds very appealing to me. Regardless of this going forward or not, I plan to keep developing reframed for the near future.
Also, despite the larger community, cobrapy has very limited functionality, and many of these dozen other tools implement methods not available in cobrapy (for instance, in reframed alone there are 15 different simulation methods). Hence the cobrapy community would also benefit from such cooperation.
As an attempt to establish a bridge, I have recently implemented round-trip conversion between the CobraPy and ReFramed model objects:
As you can see this only takes a few lines of code since the model structures are already quite similar.
Best regards, Daniel
I intended this as a clean slate discussion because I think that an implementation of a core package that closely follows SBML leaves plenty of creative freedom to build on top of. This is not conducive to this discussion but since you had to bring it up, I cannot let your framing here go uncommented.
I have discussed with the cobrapy folks once every few years about a possible reconciliation, which has never happened, one large reason being the insistence on keeping this legacy cobrapy model structure.
In the time that I have been part of this project, the initiative to discuss potential ways to bring framed and cobrapy closer together has come from our side. In particular, during the first discussion that I was present at, we suggested a handful of approaches. One of them was a very similar idea to this one. Create a base package of compatible classes that both cobrapy and framed could inherit from. You declined all options that we had to offer, leaving us with the impression that the only choice you would consider was something that looked like framed. One reason that also seemed very important to you, was that you preferred to make design choices and changes as you saw fit without being encumbered by a larger community. We certainly have to be more conservative with API changes since there are several hundred users whose code will break if we do. However, our discussion was largely decided before even considering too specific technical details.
As an attempt to establish a bridge, I have recently implemented round-trip conversion between the CobraPy and ReFramed model objects
So you are painting a picture of yourself here where you are the one trying to establish relationships when my experience so far has been that you have been closing the doors offered to you. Maybe you have since changed your mind and up until your last comment, I thought you made valuable and helpful contributions to the discussion. I don't mind differences of opinion and it is certainly your right to work as you please but kindly do not neglect your own role in past decisions; positioning us as the party unwilling to compromise.
As you can see this only takes a few lines of code since the model structures are already quite similar.
Going forward, I would like to focus on those similarities that you so rightly point out and on the mutual benefits that we could enjoy.
For me @draeger summarized my own sentiments perfectly. I think having cobrapy interact with an intermediate data model would fix a lot of issues. And being able to chose different serialization back-ends would be a game changer for a lot of our projects. I would probably keep this layer as a purely data-oriented one though with the leanest modification abilities possible. Model modification can often not be isolated from the solver as @matthiaskoenig pointed out, so this would just lead to duplicated code.
I agree that this is a big endeavor that might fragment the contributor base and might lead to less contributors maintaining cobrapy itself. So it would be important to know how many people would be willing to actually maintain that split in the long term.
Hi all, great discussion. Perhaps we should have an online meeting/survey about this.
@Daniel Machado Your code looks like a duplication of the cobra core model
with the exception of the first class Compartment
object which I want to
see forever in cobrapy (but did not happen so far). So definitely one
candidate use case (especially if you get all the model IO for free).
I did not want to sound too negative before, but there have to be some actual adopters of the core structure by other tools. Otherwise I don't see much advantage of separating the core out in a separate package (perhaps having some lean install options which only pull in the the most essential packages could be an alternative solution). Probably a survey what the reasons by other cobra tools are for not adopting the cobra core model could be a good start. Just separating the modules will not solve the underlying issues, but we have to also fix what is keeping others from using the cobra core data model.
Best Matthias
On Tue, Jul 14, 2020 at 12:01 AM Christian Diener notifications@github.com wrote:
For me @draeger https://github.com/draeger summarized my own sentiments perfectly. I think having cobrapy interact with an intermediate data model would fix a lot of issues. And being able to chose different serialization back-ends would be a game changer for a lot of our projects. I would probably keep this layer as a purely data-oriented one though with the leanest modification abilities possible. Model modification can often not be isolated from the solver as @matthiaskoenig https://github.com/matthiaskoenig pointed out, so this would just lead to duplicated code.
I agree that this is a big endeavor that might fragment the contributor base and might lead to less contributors maintaining cobrapy itself. So it would be important to know how many people would be willing to actually maintain that split in the long term.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/967#issuecomment-657815230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG33ORLXMTCT3WLO7YLKYLR3N74HANCNFSM4OV7HONQ .
-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435
Hello all,
Thanks for such a nice discussion!
As a COBRApy user, I lean on the side of @matthiaskoenig:
Basically, this would mean that current "cobra" package would be intended by design to be a core package for other fancier simulations packages (which is actually what I always believed before reading this thread).
From what was discussed so far, I noted two epic points that were mentioned in favour of a new package but that are, IMO, not incompatible with keeping everything within COPRApy:
pip install cobra
for lightweight uses vs. pip install cobra[all]
for the full bundle (see ref, idea mentioned by @matthiaskoenig). Is it reasonable?Hi everyone,
Nice to see such an active discussion! I'll add my view as well:
The bottom line is that, if this goes forward I'll happily adapt my code to use the core model, in the hope that users can smoothly use my tools while doing model manipulation and FBA with their favorite library 😄
Best, Mattia
2. "we want a core package with light dependencies": Yes!.. and the solvers are heavy, but critical for many uses. I believe we should consider to declare heavy dependencies as setuptools "extras", which would make things as easy as `pip install cobra` for lightweight uses vs. `pip install cobra[all]` for the full bundle (see [ref](https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies), idea mentioned by @matthiaskoenig). Is it reasonable?
I think that the question of whether to have the core as a separate package, or as part of the same package with "extras", is a minor design issue. After all, why can't two packages be supported by exactly the same people (and even be maintained in a single git repository).
On the other hand, wouldn't making it a separate package make it more likely to be used by other projects that only need the core functionality? I can definitely say that was the case for me when I needed something like a Model
class for equilibrator and opted not to depend on cobrapy because it comes with too much baggage. Maybe @bdelepine 's suggestion covers this scenario, sorry if I don't fully understand how pip
works.
Maybe @bdelepine 's suggestion covers this scenario, sorry if I don't fully understand how pip works.
Baudoin's suggestion does indeed cover this because a normal installation of cobra would in that scenario only install the minimal dependencies. I must admit that I am not in favour of this, though, because many users have enough trouble with Python and package installation already. If we additionally tell them, "oh, FBA only works if you also install xyz" I'm afraid it will add to the confusion and frustrate users further who expect everything to work right away. Conversely, if cobrapy depends directly on a cobra-core (or whatever name) package, its use is not noticeable for the majority of users while developers have more options.
- (while it's a beautiful idea) I'm sceptical too in the feasibility of getting "one-man shows" onboard,
I actually think that there are a few advantages even just for cobrapy.
A potential downside is that bugs in the core package will likely be noticed by cobrapy users but since the two projects will be tightly linked, I think we can handle that.
- and the priority should be to list the reasons why "one-man shows" did not inherited from COBRApy classes, so that COBRApy could be improved to cover those needs.
This is a very important point. And it is indeed part of the reason why I invited everyone to comment here. Cobrapy has a lot of ugly sides. It started as a port of the COBRA Toolbox, then slowly grew into something more Pythonic with true object-oriented design and expressive definition of the underlying optimization problems. It did grow very organically, though, without a common design and guidance so it is quite messy under the hood. I believe that the difficulty to modify cobrapy is impeding innovation and I hope that such a separation will help. But of course we should not repeat the same mistakes that prevented others from adopting cobrapy as a basis in the first place. If we can make the necessary design changes to get broader adoption and support for it, then that's worth the trouble of finding a consensus.
Hey all,
For those interested in continuing this discussion and planning concrete next steps, @bgoli and I will meet online at 11:00 CET on Wednesday March 24 in the context of HARMONY. I'll post a link here when I know which platform we'll use to connect.
I think this is something that should be voted on by the general community. Summarizing the comments here the general opinion is pretty mixed and pretty much negative in some cases (reframed).
So I would put that up to a vote and also make it clear that this could lead to slightly broken cobrapy for a while since that is a big change that won't just happen without bugs (similar to the optlang transition). Also it definitely needs input from @phantomas1234.
While I welcome any input on whether or not this is a wanted/needed/desirable approach, I don't see how it would leave cobrapy in a slightly broken state? I would envision development to start on a new package completely independently. When that is in a stable state, work would start on cobrapy to inherit from that new package. When that is complete a new release would seal the deal. Of course, it is a bunch of work and bugs are likely (but hopefully many caught by our test suite), I see zero impact on users.
Sorry, of course I don't know if that will happen. It's just based on the experience with the optlang transition that used the same strategy. Even though all tests passed in the first release there were still a lot of new bugs and performance issues that took a while to make the package stable again. So I feel that this should be tackled after a 1.0 release and fixing the already reported issues.
I agree that there is an element of risk. I do see the risk as being much lower compared to introducing a completely new solver backend, though. It means for cobrapy that the io
module would be moved out and all of the core classes need to be slimmed down in as much as that functionality will be provided to cobrapy by the other package. In general the classes have remained fairly stable so I don't see any surprises there.
Can we wait with any larger refactoring until there is a dedicated maintainer for cobrapy hired again? It looks like I'll be able to do that soon. Right now I would favor not taking any risks here.
After some discussions and thoughts I see this plan much more positive. A big advantage would be a much smaller core package with limited dependencies. The smaller code base would allow for much higher quality code, e.g. full type annotations and static type checking among others.
Hi Niko (@phantomas1234), any follow up on this?
"Can we wait with any larger refactoring until there is a dedicated maintainer for cobrapy hired again? It looks like I'll be able to do that soon. Right now I would favor not taking any risks here."
I don't have time to maintain reframed anymore, and might start using cobrapy for new projects, but I don't want to do it before any major refactoring or a change of lead developer.
Not Niko, but I may have some insights. I can't really speak for everybody but based on the last dev meetings and discussions the plan seems to be not to change anything big for now. In terms of the API it looks like everybody feels strongly that it should remain stable so I wouldn't expect any large changes there. Current plan is rather to stabilize everything for a 1.0 release/ a new publication. In terms of the lead developer, cobrapy has transformed into a community project a while ago. So it's mainly a set of core developers that manage the repo and that set can change. It does not depend on a single lead as much as before I would say, but I let others chime in. The advantage is that things keep moving even if @phantomas1234 or @Midnighter have less time. Also major props to Moritz for continuing to contribute a lot in his free time 🙏
Hi Niko (@phantomas1234), any follow up on this?
The idea behind this issue is still very much on my mind but I simply do not have the time to spare to even start. I did kick off some email conversations and a Google Doc that I think you've seen already.
Last time I talked with Niko, none of the resources promised to him for COBRApy actually materialized and his focus also shifted slightly as far as I can tell. So for the foreseeable future, there isn't anybody who will work on COBRApy full (or even part) time. I see many places where we could improve things but given the situation, consolidating what we have and pushing out a version 1.0 + publication seems like the best path forward.
What would you miss the most from reframed in terms simulation/analysis methods (I'm sure you'll miss plenty regarding how things are implemented)? Since you already switched to optlang, maybe adding a few missing pieces wouldn't be too much work.
Thank you @Midnighter and @cdiener !
I would like to establish a long term roadmap (5+ years) for CarveMe and other tools I plan to develop in my group, and I would like to make that in a way that is sustainable and aligned with the roadmap for the rest of the cobra ecosystem (cobrapy, optlang, bigg, escher, cameo, MEWpy, etc).
Regarding cobrapy/reframed I see two use-cases:
Your feedback already helps me have a better perspective. Now I know that things will be stable for a while, so thanks again for the detailed replies.
Hi,
I'm trying to gauge the general interest in and start a discussion around extracting some of cobrapy's core functionality into a separate package. The goal of the new package would be a faithful Python representation of an SBML model, as well as covering seamless input and output of models; importantly without requiring mathematical solvers.
I have written down my motivation and tried to explain it further in a Google Doc that anyone can comment on.
As I explain in the document, I also hope that this may provide a common foundation for the various Python constraint-based modeling frameworks so that we can solve and maintain the boring parts in one place.
This should be of interest to @opencobra/cobrapy-core and especially @Hemant27031999 due to his GSoC project with @matthiaskoenig and @draeger. I am particularly interested in your opinions @bgoli and @cdanielmachado. Do you see the same value in this as I do? Can you see yourself helping to maintain such a package and adjusting your own frameworks to build on top of it?
With best intentions and hopes of a stronger COBRA community, Moritz