sys-bio / tellurium

Python Environment for Modeling and Simulating Biological Systems
http://tellurium.analogmachine.org/
Apache License 2.0
107 stars 36 forks source link

Update distrib support in RoadRunner #391

Open hsauro opened 6 years ago

hsauro commented 6 years ago

Andrew you migth want to look at this issues, talk to Kiri to get up to speed on the problem.

Consider this simple model, it only has a single statement that samples from a unifrom distribution:

import tellurium as te import roadrunner import pylab

r = te.loada(""" val := uniform (1, 1) """) m = r.simulate (0, 10.1, 400, ['time', 'val'])

The simulation returns NAN in 2.0.18 and the following error message in, 2.1.0:

RuntimeError: Unsupported distribution: , at ?loadSymbolValue@DistribFunctionResolver@rrllvm@@QEAAPEAVValue@llvm@@PEBVFunctionDefinition@libsbml@@PEBVDistribFunctionDefinitionPlugin@6@AEBV?$ArrayRef@PEAVValue@llvm@@@4@@Z

luciansmith commented 6 years ago

I should mention that I am very close to releasing a new, simplified version of the 'distrib' spec which would change the way distributions are encoded in SBML entirely. This would leave (basically) two different versions of SBML models that encode distributions out there: one where a function definition was overridden by an annotation, and a second one where each distribution had its own MathML 'csymbol'.

The last few versions of Antimony have supported distributions as both the 'function definition overridden by an annotation' version, as well as the proposed 'UncertML-like' 'function definition overridden by distrib'. But if we go the csymbol route, the second version of that will go away, so don't waste any time trying to support it.

Hmm, that was a bit confusing. A summary:

hsauro commented 6 years ago

We could probably keep the current annotation method to remina compatible with other simulators and then put in the new MathML csymbols once libsbml is ready.

kirichoi commented 6 years ago

Version 2.1.0 issues are due to roadrunner on PyPI not compiled with distrib package. The nan issue, however, is a valid concern.

hsauro commented 6 years ago

ok I see, nan issue may be a real bug.

luciansmith commented 6 years ago

The issue with NaN is not actually a bug! The function definition in question has two sets of math: the bit that gets overridden by distrib, and the bit contributed by distrib. And, as it happens, the bit that is supposed to get overridden says 'return NaN' to make sure that people don't accidentally use it.

So the solution to both problems is just 'compile with distrib'.

(However, as noted, if it's using actual distrib objects, those are going away soon, due to popular un-demand.)

hsauro commented 6 years ago

Don't quite understand, this used to work in roadrunner, James Galzier used the feature when he taught modeling last year and needs it again in a few weeks. So what has changed? Isn't it a bug?

kirichoi commented 6 years ago

Not sure I understand it either. I am testing on roadrunner compiled with distrib and I am still getting NaN. Is this non-issue?

luciansmith commented 6 years ago

Kiri said: "Version 2.1.0 issues are due to roadrunner on PyPI not compiled with distrib package." A version of rr with that change would, by design, return NaN.

However, Kiri says that rr compiled with distrib still returns NaN, so that means there's some other problem also going on.

Part of the problem is that I don't actually know how distrib support was added in roadrunner: I had kind of assumed it was through the annotation scheme, but it sounds like it was through 'distrib'?

If rr support worked through distrib, that means that rr will need to be changed, since distrib itself changed, and is about to change again. One way to change it that will work today is to ignore the 'distrib' things entirely, and instead look at the annotation. This will maintain backwards compatibility, since Antimony would output SBML files that used both the annotation and the (now obsolete) 'distrib' scheme.

My guess is that rr and Antimony are currently using different versions of 'distrib', both of which are being obsoleted! So your best bet is to drop all the old distrib code, and instead look at the annotation (which is consistent and will still work on these models in the future). Then when distrib changes yet again to the csymbol approach, implement support for that.

If need be, I can walk over and we can talk about this in person ;-)

kirichoi commented 6 years ago

My guess is that rr and Antimony are currently using different versions of 'distrib'

That is highly plausible.

luciansmith commented 6 years ago

If this turns out to be the case, a quick fix is to revert Antimony.

kirichoi commented 6 years ago

Update from @luciansmith

Hi, everyone! I have just now checked into Kyle's 'miriam' branch for Antimony a version of Antimony that uses the new version of 'distrib'. In short, in this version, all distributions are given their own ASTNode type (like AST_DISTRIB_FUNCTION_NORMAL), and that's it. There are no longer overridden FunctionDefinitions with new contents: it's all just ASTNodes.

There is also a converter in libsbml (used by libAntimony) that converts the 'wikipedia definition URL' version of FunctionDefinitions and converts them to the new ASTNodes, too, so libroadrunner need only support a single version of distrib, while still being able to read in versions of SBML files from Copasi or older versions of Antimony that our users may have lying around. Basically, you would just run the converter on import, and go from there.

This does mean that all distributions that were defined by Frank for the 'wikipedia URL' annotation scheme are currently implemented, which is seven more distributions than we had previously. If for some reason we don't want to implement support for one or more of them, and nobody else does either, we can take them out of the 'distrib' spec itself. Conversely, if there are any distributions anyone feels we should support that are not in the current list, now is the time to add them: I can put them in libsbml and Antimony, and then we can support them with roadrunner.

The full list of distributions is: AST_DISTRIB_FUNCTION_NORMAL AST_DISTRIB_FUNCTION_UNIFORM AST_DISTRIB_FUNCTION_BERNOULLI AST_DISTRIB_FUNCTION_BINOMIAL AST_DISTRIB_FUNCTION_CAUCHY AST_DISTRIB_FUNCTION_CHISQUARE AST_DISTRIB_FUNCTION_EXPONENTIAL AST_DISTRIB_FUNCTION_GAMMA AST_DISTRIB_FUNCTION_LAPLACE AST_DISTRIB_FUNCTION_LOGNORMAL AST_DISTRIB_FUNCTION_POISSON AST_DISTRIB_FUNCTION_RAYLEIGH

(The ones previously supported by Antimony and roadrunner were normal, uniform, exponential, gamma, and poisson.)

Note also that there are 'truncated' versions of many of the above distributions as well, which are distinguishable only by the number of arguments to the function: normal(0,1) is a standard gaussian with mean of 0 and a standard deviation of 1, and normal(0, 1, -3, 3) truncates that distribution to values between -3 and 3.

kirichoi commented 5 years ago

Commit https://github.com/sys-bio/roadrunner/commit/bd70cdb04c8ccb9ea677e43132d26484554edc12 restores roadrunners distrib support. Hooray! However, I noticed that roadrunner currently only supports normal and uniform distributions. Do we want some other ones?

matthiaskoenig commented 5 years ago

I definitely need lognormal and exponential, poisson. It should not be to difficult to support all of them.

kirichoi commented 5 years ago

Commit https://github.com/sys-bio/roadrunner/commit/f991e30f0d79b8bfd34ed76131d2201add96e9b9 adds support for exponential, poisson, and lognormal distributions. Keep in mind this is for legacy models only (pre-AST_DISTRIB). This also means that the antimony version that supports this change (2.9.4) does not support lognormal unfortunately. SBML files should be readable though.

While working to implement AST_DISTRIB nodes I realized that the roadrunner handles distrib through functions definitions and it is hard to separate it out. It might take some time to actually support new AST nodes.

matthiaskoenig commented 5 years ago

@kirichoi Thanks for all the work. Let me know when the AST_DISTRIB is supported. I implemented the reading and writing of distrib models in my tools with extended math, but would need a simulator to test and simulate the models.

kirichoi commented 5 years ago

I have started working on this issue again, only to be reminded of the problem I faced earlier. Basically, for distrib functions to work, a pointer to random field is required. However, defining this pointer is non-trivial in the scope of ASTNodeCodeGen because none of the higher level roadrunner objects necessary to construct a Random object are passed.

Right now, I am thinking of workarounds instead of refactoring ASTNodeCodeGen, so that distribution function does not require Random object, e.g., from

double distrib_uniform(Random *random, double _min, double _max)
{
    Log(Logger::LOG_DEBUG) << "distrib_uniform("
            << static_cast<void*>(random)
            << ", " << _min << ", " << _max << ")";

#ifdef CXX11_RANDOM
    cxx11_ns::uniform_real_distribution<double> dist(_min, _max);
#else
    cxx11_ns::uniform_real<double> dist(_min, _max);
#endif
    return dist(*random);
}

to

double distrib_uniform_ast(double _min, double _max)
{
    std::random_device rd;
    cxx11_ns::mt19937 engine(rd());
    Log(Logger::LOG_DEBUG) << "Test11";
    Log(Logger::LOG_DEBUG) << "distrib_uniform_ast("
        << static_cast<void*>(&engine)
        << ", " << _min << ", " << _max << ")";

#ifdef CXX11_RANDOM
    cxx11_ns::uniform_real_distribution<double> dist(_min, _max);
#else
    cxx11_ns::uniform_real<double> dist(_min, _max);
#endif
    return dist(engine);
}

This works, except you can't explicitly set a seed which is a big problem. Anyway, I wanted to share this to get ideas on other potential approaches. Any thoughts?

kirichoi commented 5 years ago

BTW, is it even possible to set seed for distrib in SBML or SEDML? If not, the issue with seed is a non-issue because there is no standardized way to exchanging the info.

luciansmith commented 5 years ago

You can set a seed in SED-ML, but not SBML, but only for the simulation experiment as a whole, not for individual calls. As long as you can get identical results from a simulation on subsequent runs if you want to, (and setting the seed from time() or something if not), that's all that's required for that. I personally don't think it'd be a huge issue if you couldn't let the user set a seed; the main thing you need it for is debugging: sometimes if an odd thing happens in a stochastic environment, you want to set the seed so it'll happen again so you can figure out what the problem is.

I'm not sure what the architecture of libroadrunner is here, so I'm not sure I can be any help for your other problem: in general, I'd see if ASTNodeCodeGen function has access to a parent, perhaps? Or perhaps there's some existing global object you could attach 'Random' to?

kirichoi commented 5 years ago

In that case, my current solution might be acceptable. The problem is ASTNodeCodeGen doesn't have access to the parent and will require refactoring to do so. Probably that will be the best solution for the long term but for now, I will push my changes so that people can at least use models with AST_DISTRIB.

matthiaskoenig commented 5 years ago

It should be possible to set a seed for the simulation. This currently is possible for stochastic simulations by setting the seed as an integrator setting (see https://sys-bio.github.io/roadrunner/python_docs/stochastic.html). One could imagine the same mechanism for distrib, i.e. setting the seed for the integrator which is used as seed for the random_number engine.

kirichoi commented 5 years ago

Distrib spec says nothing about how a seed should be set so technically, my solution is in accordance to the specification. However, I also thought being able to set seed is important so I refactored the code to take necessary classes to build Random object through commit https://github.com/sys-bio/roadrunner/commit/fa21f864c6250252e4029813a3d12c82d86af57e. Now you can set a seed for distributions using a global seed, e.g.:

roadrunner.Config.setValue(roadrunner.Config.RANDOM_SEED, 123)

which is shared with all other randomly generated things, e.g. stochastic simulations.

Hopefully, nothing breaks even though everything seems to be working. The branch is failing in Travis but the issue seems to be stemming from something else.

@0u812 Do you remember which version of libSBML we are using in libroadrunner-deps_2.0.8?

@luciansmith I also remember you talking about truncated versions of distributions. Could you remind me which distributions support them? Currently, we support normal, uniform, lognormal, exponential, and poission distributions.

luciansmith commented 5 years ago

All of them can be truncated, actually. Well, not uniform, since the bounds are the truncations there.

'min' is always inclusive, that is, a value of 'min' is always allowed. For the continuous distributions (all of yours except 'poisson'), 'max' is not inclusive: a value of 'max' may not be returned. For 'poisson' both min and max are inclusive; either may be returned.

(However, if this turns out to be difficult to implement, let me know, and we might change the spec!)

In the SBML, both 'min' and 'max' will always be included. However, one of them may be -INF, 0, or INF, which may be the natural bound of the function. If your underlying implementation has the ability to define only one bound, you might check the other to see if it's one of those, depending on the function.

kirichoi commented 5 years ago

Is partial truncation also allowed? For example, defining either minimum or maximum.

kirichoi commented 5 years ago

I guess this case might be one with setting -INF or INF.

kirichoi commented 5 years ago

All features are implemented. Only need to figure out why Travis is failing. Anyway, below is what we got currently. Distributions are truncated within -1 to 1:

image image image image image

hsauro commented 5 years ago

Do you have some examples of this working, I could add it to my blog.

kirichoi commented 5 years ago

This is the code I used to generate the figures:

import tellurium as te
import roadrunner
import matplotlib.pyplot as plt

ant = """
k1 := uniform(0, 1);
k2 := normal(0, 1);
k2_1 := normal(0, 1, -1, 1);
k3 := poisson(4);
k3_1 := poisson(4, 4, 6);
k4 := exponential(1);
k4_1 := exponential(1, -1, 1);
k5 := lognormal(0, 0.25);
k5_1 := lognormal(0, 0.25, -1, 1);
"""

r = te.loada(ant)

m = r.simulate (0, 10.1, 1000, ['Time', 'k1'])
plt.plot(m[:,0], m[:,1:])
plt.title('Uniform')
plt.legend(['uniform'])
plt.show()

plt.hist(m[:,1], bins=100)
plt.show()

r.reset()

m = r.simulate (0, 10.1, 1000, ['Time', 'k2', 'k2_1'])
plt.plot(m[:,0], m[:,1:])
plt.title('Normal')
plt.legend(['normal', 'normal_truncated'])
plt.show()
plt.plot()

plt.hist(m[:,1], bins=100)
plt.show()

r.reset()

m = r.simulate (0, 10.1, 1000, ['Time', 'k3', 'k3_1'])
plt.plot(m[:,0], m[:,1:])
plt.title('Poission')
plt.legend(['poission', 'poission_truncated'])
plt.show()

plt.hist(m[:,1], bins=100)
plt.show()

r.reset()

m = r.simulate (0, 10.1, 1000, ['Time', 'k4', 'k4_1'])
plt.plot(m[:,0], m[:,1:])
plt.title('Exponential')
plt.legend(['exponential', 'exponential_truncated'])
plt.show()

plt.hist(m[:,1], bins=100)
plt.show()

r.reset()

m = r.simulate (0, 10.1, 1000, ['Time', 'k5', 'k5_1'])
plt.plot(m[:,0], m[:,1:])
plt.title('Lognormal')
plt.legend(['lognormal', 'lognormal_truncated'])
plt.show()

plt.hist(m[:,1], bins=100)
plt.show()

r.reset()

BTW, upon inspecting the distribution, I am finding some odd behaviors that are observable even in the older versions of libroadrunner.

kirichoi commented 5 years ago

When using the initial implementation, e.g.:

double distrib_uniform_ast(double _min, double _max)
{
    std::random_device rd;
    cxx11_ns::mt19937 engine(rd());
 #ifdef CXX11_RANDOM
    cxx11_ns::uniform_real_distribution<double> dist(_min, _max);
#else
    cxx11_ns::uniform_real<double> dist(_min, _max);
#endif
    return dist(engine);
}

I get the following histogram: image which seems okay.

However, when I use Random object, e.g.:

double distrib_uniform(Random *random, double _min, double _max)
{
    Log(Logger::LOG_DEBUG) << "distrib_uniform("
            << static_cast<void*>(random)
            << ", " << _min << ", " << _max << ")";

#ifdef CXX11_RANDOM
    cxx11_ns::uniform_real_distribution<double> dist(_min, _max);
#else
    cxx11_ns::uniform_real<double> dist(_min, _max);
#endif
    return dist(*random);
}

I get this: image As I said, this behavior is observed even in older versions of libRoadRunner. There is some serious flaw with how distrib was implemented I think.

hsauro commented 5 years ago

I need to think of some real examples. Lucian, you can answer this, do we have any real examples where these can be used?

luciansmith commented 5 years ago

A couple examples:

Those are off the top of my head--obviously, James has situations where he uses distributions regularly, too. My guess is that he uses it when he's implementing a multicellular network, and wants to initialize each cell randomly.

luciansmith commented 5 years ago

@kirichoi : There are tests at https://github.com/luciansmith/stochastic-distrib-test-suite, which it would be great if you could run through roadrunner to see if you get the same things I predicted--all the results are based on what I believe should be the correct answers, but no simulator has yet confirmed this.

kirichoi commented 5 years ago

I have been going down the rabbit hole a bit, but the code is very confusing. I will open a new issue because the new issue is with Random object not distrib related functions. Once libroadrunner-deps is updated with the latest libSBML, we can merge ast-distrib branch. All the component in the branch should be functional.

matthiaskoenig commented 5 years ago

@kirichoi As far as I can tell you have to reuse your random engine throughout the simulation/sampling. Only within one engine can be guaranteed that the numbers will be pseudo-random. I assume the way you do it right now is to have a new engine on every sample. The individual engines are hereby initialized/seeded with some pseudo-randomness which depends on the engine. Some use more elaborate things like content of arbitrary memory locations, but most often this is simply time. If you now generate a lot of new engines iteratively (with very similar time differences) the time could introduce correlation between the randomness. Only speculating here, but my strong feeling is: