vivarium-collective / vivarium-core

Core Interface and Engine for Vivarium
https://vivarium-core.readthedocs.io/
Apache License 2.0
23 stars 2 forks source link

some library design decisions before publishing the Vivarium paper #50

Closed 1fish2 closed 2 years ago

1fish2 commented 3 years ago

Some questions to address before publishing the Vivarium paper and inviting library users we can't consult before making incompatible changes:

U8NWXD commented 3 years ago

My 2 cents on the deriver-process-factory-composite-compartment-generator inheritance question:

First some definitions:

Given these definitions, Composite isn't a class--it's just a dictionary with particular keys. The class is a Factory (what we used to call Generator). A Process knows how to create a simple Composite from itself, so it inherits from Factory.

As far as code changes, this means that we combine the Composite class into Factory. Since Composite is the only class that inherits from Factory, this shouldn't be a problem. Then, we have the following hierarchy:

Factory               dict
   ↑                   ↑
Process             Composite
   ↑
Deriver

Where Factory objects can produce Composites.

@eagmon @prismofeverything @1fish2 what do you think?

eagmon commented 3 years ago

I think @U8NWXD's proposed definitions make sense, and would be fine re-combining Composite and Factory. Separating Composite from Factory did seem a bit unnecessary, but I could also see other variations of Factory being made in the future.

eagmon commented 3 years ago

Regarding the Process derivers() method -- I'm in support of removing it entirely, and then removing the generate_derivers() function so that all derivers have to be declared explicitly in a Composite. The only thing holding me back is all the use of derivers in vivarium-cell. I propose that rather than going through the labor of converting all of those, we keep the older version of vivarium-core for vivarium-cell and deprecate vivarium-cell. I can pull the multibody process out to a separate library. Other processes can be moved to a separate "playground" repository as the become needed.

Edit: I started vivarium-pymunk -- https://github.com/vivarium-collective/vivarium-pymunk. Metabolism has also been moved to its own library -- https://github.com/vivarium-collective/vivarium-cobra. The rest of the processes in vivarium-cell can be handled as-needed, so I say we go ahead and remove automatic derivers.

Edit: automatic derivers are removed in commit 4905ec65657031274e0fc0747fb0bc15c2f41dd8

eagmon commented 3 years ago

An additional breaking change that I have considered is to make the helper functions in composition.py use the more typical function arguments rather a settings dictionary with argument keywords. Any thoughts on this? If we are making breaking changes, it might be worth getting them all in at the same time.

1fish2 commented 3 years ago

**kwargs for settings might be more convenient. That changes the way of declaring and documenting settings item types and defaults to ordinary:

def simulate_experiment(experiment: SomethingHere, *, total_time: int = 10, return_raw_data: bool = False, timeline: SomethingElseHere = None, **kwargs): ...

Otherwise we'd need to use TypedDict to declare the setting value types and prose to document their defaults, and we couldn't make some settings like total_time required.

1fish2 commented 3 years ago

Thanks, @U8NWXD for your 2¢! So a Deriver is for computing derived values from the current state, not for modeling, and it has to run after all Processes in order to get the Processes' updated outputs.

Q. Are Processes first-order difference equations (computing new state values as a function of the previous time step's state values) while Derivers compute other state values as a function of current values? And Derivers output state variables must not overlap with other Derivers and with any Processes?

s1[t] = process1(s1[t - 1], s2[t - 1])
s2[t] = deriver1(s1[t])

Q. Should Derivers be topo-sorted like spreadsheet formulas so they can get other derivers' output values?

A Composite is a collection of Processes wired together with a topology. This is just a dictionary with keys processes and topology. A Factory produces Composites with its generate_topology and generate_processes functions.

To clarify, a Factory builds a Composite (of Processes) from a Composite specification which is just a dictionary?

Can we meet to discuss process-factory-composite-compartment inheritance? I'm still fuzzy on it.

eagmon commented 3 years ago

@1fish2 -- Yes, we should meet to talk about process-factory-composite-compartment. Everything that you mentioned above sounds correct, except that Deriver values can overlap with other Processes' values. In fact, some systems depend on that -- for example, a process that increases counts of a molecule based on its concentrations depends on a deriver that updates concentrations based on counts. But we should clear up any ambiguity, and get more refined usage of our terms.

@U8NWXD -- thinking about Composite as a dict with particular keys ('processes' and 'topology') is actually very clarifying. Composition then is just a merge operation of Composites. I am still not fully satisfied with the merge operation I put in there, but the dictionary approach makes it much easier to think about. I have wondered about merging factories -- can we wire the Composite output of a Factory in another Factory? I think easy merging will be very helpful, so that Composites can be incrementally built up rather than re-defined every time.

Derivers are really important. They are like Processes with a timestep of 0, which only run after the dynamic Processes to derive values. For example, concentrations from counts or volume from mass. Currently, all of the derivers run in order after every Processes update. Ideally there should be a topo-sorted approach that triggers Derivers only when their input values are updated. I have removed the automatic derivers in #49, so Processes no longer declare their required derivers. We should make it easy to pre-bundle Processes and Derivers together in Composites/Factories -- this relates to the easy merge operations described above.

U8NWXD commented 2 years ago

Checking off Can we move the generated doc files out of git? because the API stubs are no longer auto-generated

U8NWXD commented 2 years ago

@eagmon why are we closing this? I don't think we've finished making all these decisions

eagmon commented 2 years ago

@U8NWXD Maybe we should meet to look at it very soon? The paper will be online probably this week and that will come with more traffic, including to issues.

U8NWXD commented 2 years ago

Here are my suggestions for how to tackle the remaining tasks here for 1.0:

  • [x] Finalize which methods are class methods.

We can stop ignoring no-self-use in .pylintrc to flag all the cases where methods accept self but don't use it.

  • [x] Which definitions should be private? Use __all__ or _xyz names to declare that.
  • [ ] Which modules should be private?

I prefer _xyz names. I'd propose going through core and searching for every definition to see if it can be made private. I don't think we have any modules to make private.

  • [x] Which Python versions to officially support? 3.6 - 3.9? Declare that in setup.py.

I think we should only support Python versions that we run tests on. Right now that's only 3.8, but I think we should see if our tests pass on the later Python versions too.

  • [x] Use abc abstract base classes & methods anywhere else?

We currently use this for Process (so Step is by extension an abstract class) and Composer. I think those are the only classes that we want to make abstract.

  • [x] The library has 130 isinstance() checks! isinstance() should be used sparingly since it makes for fragile code.

I think we can release 1.0 without fixing this

eagmon commented 2 years ago

141 and #143 complete these tasks. Remaining decisions added as issues #139 #140.