Closed iantmiller closed 5 years ago
You have to write a theano wrapping operator around the program you want to call. Then you can simply use the random variables as is.
You can find one example of how to do that in my beat repository https://github.com/hvasbath/beat/tree/master/src in theanof.py l.79-166 this operator is then called in models.py l.343 and yields a function that takes the random variables from pymc3 as inputs models.py l.414)
I am sorry for the complexity of the example, but unfortunatley I think it is not possible to make it easier. You dont need to know what goes on in the functions. Just for principle. I also recommend you to read in the theano tutorial section "how to expand theano?"
I hope this helps up to some point ...
I have exactly the same issue, proprietary software code which I can only instantiate through a command call. I have been trying to get around and here is a working version of your code with a theano op:
from pymc3 import *
import numpy as np
import matplotlib.pyplot as plt
import theano.tensor as t
import theano
import json
import os
# X1, X2, and Y were generated from the code on Pymc3 tutorial website. first example
X1 = [-1.0856306, 0.99734545, 0.2829785, -1.50629471, -0.57860025, 1.65143654, -2.42667924, -0.42891263, 1.26593626, -0.8667404, -0.67888615, -0.09470897, 1.49138963, -0.638902, -0.44398196, -0.43435128, 2.20593008, 2.18678609, 1.0040539, 0.3861864, 0.73736858, 1.49073203, -0.93583387, 1.17582904, -1.25388067, -0.6377515, 0.9071052, -1.4286807, -0.14006872, -0.8617549, -0.25561937, -2.79858911, -1.7715331, -0.69987723, 0.92746243, -0.17363568, 0.00284592, 0.68822271, -0.87953634, 0.28362732, -0.80536652, -1.72766949, -0.39089979, 0.57380586, 0.33858905, -0.01183049, 2.39236527, 0.41291216, 0.97873601, 2.23814334, -1.29408532, -1.03878821, 1.74371223, -0.79806274, 0.02968323, 1.06931597, 0.89070639, 1.75488618, 1.49564414, 1.06939267, -0.77270871, 0.79486267, 0.31427199, -1.32626546, 1.41729905, 0.80723653, 0.04549008, -0.23309206, -1.19830114, 0.19952407, 0.46843912, -0.83115498, 1.16220405, -1.09720305, -2.12310035, 1.03972709, -0.40336604, -0.12602959, -0.83751672, -1.60596276, 1.25523737, -0.68886898, 1.66095249, 0.80730819, -0.31475815, -1.0859024, -0.73246199, -1.21252313, 2.08711336, 0.16444123, 1.15020554, -1.26735205, 0.18103513, 1.17786194, -0.33501076, 1.03111446, -1.08456791, -1.36347154, 0.37940061, -0.37917643];
X2 = [0.12841094, -0.39557759, 0.14245293, 0.51966079, -0.0049252, 0.00682843, 0.0359099, -0.37239514, 0.08522933, -0.32108195, -0.08553592, 0.24857391, -0.14704339, 0.1002498, 0.20254781, 0.05574817, -0.27418969, -0.06649506, 0.39188227, -0.40500915, -0.0551572, -0.11042161, 0.02414947, 0.14964312, 0.32173819, -0.05404648, 0.16246827, 0.09994803, 0.09486946, -0.11278479, -0.19946429, -0.22000862, -0.15128744, 0.06433732, 0.15218988, 0.06469377, -0.10979102, 0.36119402, 0.30377312, -0.07080002, -0.16468628, 0.02604299, 0.25345973, 0.066553, 0.11130974, -0.04241602, 0.09125418, 0.30890889, -0.04793376, 0.02866155, 0.0507633, 0.05674507, -0.28237778, -0.37537373, -0.20393101, 0.03358846, 0.11077123, -0.10613491, 0.2754515, -0.02863519, 0.0040632, -0.03879277, 0.02680536, 0.14089481, 0.13313069, -0.17968459, 0.30473276, -0.21900529, 0.0158454, -0.05487931, -0.20979834, -0.01502412, -0.14816275, 0.01458145, 0.08061719, 0.29438587, 0.06147684, -0.12224507, -0.07832396, 0.02799562, 0.01869217, 0.29191785, 0.27907059, -0.07178719, -0.10972843, -0.51141092, -0.10978408, -0.19561154, -0.07096489, 0.07831685, 0.03543847, -0.0059936, 0.03991642, -0.02522355, 0.03940379, -0.646211, -0.0538587, -0.02217014, -0.06825234, -0.04358925];
Y = np.array([9.38706859e-01, 4.10296149e-01,3.83981292e+00, 1.48115418e+00, 4.02779506e-01, 2.46184530e+00, -1.42342679e+00, -1.27520755e+00, 2.38380704e+00, -3.90761758e-01, 6.86815665e-01, 2.10641559e+00, 1.84890360e+00, -8.04359754e-01, 3.93284941e-01, 2.31721220e+00, 3.41651416e+00, 3.39016804e+00, 2.22246532e+00, 3.77308673e-01, 3.43806883e-01, 1.66274112e+00, -1.20663529e-01, 2.18829692e+00, 1.50706675e+00, -1.19159361e+00, 1.44784359e+00, -1.55349860e+00, -1.40248284e-01, -1.96609652e-02, -1.35472064e+00, -1.59474188e+00, -1.39656749e+00, 5.29754386e-01, 2.63051387e+00, 5.53932221e-01, 1.76084808e+00, 2.39686504e+00, 1.47396672e+00, 9.07514885e-01, 7.37921664e-02, -3.82899347e-01, 1.49271947e+00, 7.65880501e-01, 2.05273917e+00, 5.63172455e-01, 4.25098874e+00, 3.26909416e-02, 3.93785393e-01, 3.67324277e+00, 1.69575050e+00, 9.38133214e-01, 1.35531685e+00, -2.42854948e+00, 1.26254192e+00, 2.07270390e+00, 2.75833869e+00, 2.60484762e+00, 3.21391580e+00, 4.95643013e+00, 2.31319324e-01, 1.53863552e+00, 1.25983672e+00, -5.57565140e-01, 3.74025866e+00, 1.00427073e+00, 2.44326467e+00, 5.03997740e-01, 1.06029822e+00, 1.48250538e+00, -2.69441500e-01, -1.19520306e+00, 3.20016631e+00, -6.69460228e-01, -2.24215995e+00, 2.10607318e+00, 2.01495136e+00, -8.51855254e-01, -8.99821832e-01, -1.20278122e+00, 1.05077792e+00, -1.43401688e-01, 1.84052098e+00, 1.16665281e+00, 5.60119574e-02, -2.04696786e+00, -1.66062003e+00, 5.51783963e-01, 1.58062230e+00, 1.63826706e+00, 1.16403511e+00, 3.85980814e-01, 2.23665854e+00, 1.23718947e+00, -1.16021703e+00, 1.11137427e+00, 1.65658589e+00, -3.20236525e-03, 1.36931418e+00, 1.33161104e+00]);
i = 0
@theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dscalar],otypes=[t.dvector])
def proc_test(alpha, beta1, beta2):
#### write an input file
global X1, X2, i
params = {};
params['X1'] = X1;
params['X2'] = X2;
params['alpha'] = float(alpha);
params['beta1'] = float(beta1);
params['beta2'] = float(beta2);
###
with open("input.txt", 'w') as outfile:
json.dump(params, outfile)
#### make a system call
os.system("python dummy.py input.txt");
# Get the number of function runs each 100 NOT necessary, just checking
i +=1
if i%100 ==0:
print " number of evaluations {}".format(i)
#### read the output file and return the value
mu = np.loadtxt("output.txt");
return(np.array(mu))
# the model
with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
# Priors for unknown model parameters
alpha = Normal('alpha', mu=0, sd=10);
beta1 = Normal('beta1', mu=0, sd=10);
beta2 = Normal('beta2', mu=0, sd=10);
sigma = HalfNormal('sigma', sd=1);
# Expected value of outcome
mu = proc_test(alpha, beta1, beta2);
# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
# Inference!
#start = find_MAP(); # Find starting value by optimization
step = Metropolis()
trace = sample(100,step)
#plot results
plt.figure(figsize=(12, 12))
summary(trace)
traceplot(trace)
This implementation has the drawback of not having a gradient calculator. Thus you can not use MAP estimation or samplers which require gradient calculation (eg. NUTS). I use here Metropolis.
Perhaps someone with deeper knowledge on pymc3 could help. If you have a function called as an external cmd command + reading outputs from file, is there a way to implement a gradient calculation for the Theano op?
I wonder why is this not a recursive issue in pymc3 (I haven't seen practically questions or answers on this). I imagine some people would want to do inference in external/not own models which wasn't an issue in pymc2. But migrating to pymc3 is not trivial.
Perhaps someone with deeper knowledge on pymc3 could help. If you have a function called as an external cmd command + reading outputs from file, is there a way to implement a gradient calculation for the Theano op?
If the generation procedure is a complete black box, I guess the only way maybe is a stochastic gradient - but that would still be very inefficient.
I wonder why is this not a recursive issue in pymc3 (I haven't seen practically questions or answers on this). I imagine some people would want to do inference in external/not own models which wasn't an issue in pymc2. But migrating to pymc3 is not trivial.
I think any large model (with lots of parameters) is unlikely to work in PyMC2 as well, as Metropolis will not be able to explore the full parameter space. I think one way worth to try is the newly implemented SMC (thanks to @hvasbath), otherwise you can try some Approaximate Bayesian computation approach like elfi.
I can just repeat what @junpenglao said. I had the same problem like you @Argantonio65 , which is why I started implementing that code. However, if your external program is fast and the forward model not expensive you can still implement the grad method of your OP and still get the gradients. You would just need to repeatedly call your external program and for each parameter keep the others fixed and change the current one just a little. The typical numerical gradient calculations. But you would need to use the proper Op class instead of the as_op above, as this doesnt support the gradient... However, if your external program takes a long time it is waaaaaay more efficient to simply use SMC, especially if you have many model parameters.
Thanks @hvasbath @junpenglao I will try to use your implementation of SMC (as my likelihood evaluates quite slowly, thus no point of trying the gradient version).
Hey @Argantonio65 any updates on these? We are curious on how SMC performs on many different models as this is still a little experimental. Although we have now few feedbacks that it seems to work well we would like to know how its working for your case?
Hello @hvasbath, my likelihood was quite slow to evaluate (simple version of the model ~ 2-3 min) so I decided to emulate the dynamic model (just data based) and then infer on the surrogate model. My dynamics are quite smooth to the parameters so for dimension <10 the emulated structure gave very good results. And being the surrogate evaluated in 1-2 seconds the problem with the sampling efficiency was practically gone, so I stopped trying with more efficient samplers.
It is nice if SMC performs well. If I manage to get some free time with the model (I can not access to it at least for 2 weeks) I might try with SMC.
I see that there is an example of usage at SMC2_gaussians. Do you have an example in which the likelihood is based on observations? I understood that the likelihood should be defined as a random variable (like in llk = pm.Potential('llk', two_gaussians(X))).
I have an example here fitting a mixed effect model with different sampler: https://github.com/junpenglao/GLMM-in-Python/blob/3f1068dce88be6a1eeeda2654b8e0c7c1638d611/pymc3_different_sampling.py#L87-L107
what's the 'SMC' ?
Sequential Monte Carlo, a sampler in the step-methods you may try to use for your model
In my research, my generative model is encapsulated into a software code (i do not have access to the source code). This code reads an input file and writes an output file. I have two choices, i can either run a design of experiments with this code, fit a surrogate model, and then encode this surrogate model inside of pymc3's model definition. Conversely, i would like to be able to make a system call to this software code from within pymc3.
In an effort to test if this mechanism is possible, i have attempted to re-work the simplest example from Pymc3 tutorial where the calculation of mu (as a liner regression fit) is conducted in an external function call.
The following code block is a modified version of the first example from pymc3 website. Instead of having mu = alpha + beta[0]X1 + beta[1]X2 inside of the model block, mu is the return value from a sub-routine. This sub-routine takes in the current sampled values for alpha and beta[i], builds an input file, makes a system call to dummy.py, and lastly it reads the output file created by dummy.py and returns the value of mu.
The next code block is the file dummy.py. This code reads its input file, calculates mu = alpha + beta[0]X1 + beta[1]X2 and writes mu to an output file.
My issue is that I don't know enough about the core data structures that pymc3 uses. So i don't know how to extract the current iteration's sampled values of alpha, and beta[i], so that these values can be written to an input file.
thanks much, -ian