PPC Predictions for Randomwalk/Defining RV shape at runtime

How is it intended that the PPC function works with randomwalk variables for prediction? I'm trying to get predictions out of a latent variable/state space model and I'm having trouble.

I have the below model which tracks a simple mean over time with some added noise in the observations (based on this link) but because it sets the shape of the random walk based on the data at compile time, I can't find a way to predict outcomes for new data.

There are two problems here;

You can't use a TensorVariable (and hence a shared variable) to set a shape
You can't concatenate TensorVariables where you don't know the length of the output


import theano as th, pandas as pd, numpy as np, pymc3 as pm, theano.tensor as T, ggplot as gg
#Create some simulated data - imagine a different mean for each day, where a variable number of 
#observations occurs for each day.
data = np.array([1,1,1,1,4,4,4,4,4,4,3,3,3,7,7,7,8,8,8,6,6,6,6,6])
data = pd.DataFrame({'nonoise': data})
data['vals'] = data.nonoise + np.random.normal(0, .3, size = len(data))

#array counting number of rows for each "day"
data_repeats = data.groupby(by = 'nonoise').count().vals.values

#try setting the shared variables with a numpy array instead of a pandas df
data_repeats = th.shared(data_repeats)
data = th.shared(data.values)
test_model = pm.Model()

expected_time_var = 10.0
expected_within_time_var = .3

with test_model:
    sigma_alpha = pm.Exponential('sigma_alpha', 1/expected_time_var, testval = .5)

    #First issue - leaving the data in a shared variable (which is advised in the PPC tutorial) 
    #means you can't use it to set the shape. simply using the argument shape = data.shape will fail with
    #TypeError: Expected int elements in shape because the shape argument can't handle symbolic variables.

    alpha = pm.GaussianRandomWalk('alpha', sd=sigma_alpha, mu=0, 
                                  shape = data_repeats.get_value().shape)    

    #Second issue - using the below code gives an error: length not known: ARange{dtype='int64'} [id A] ''
    #alpha_daily = [alpha[i].repeat(pattern[i]) for i in T.arange(0, data_repeats.shape[0])]    
    #Using this code does work, but appears to set the computational graph for a dataset of a specific length at compile time?
    alpha_daily = T.concatenate([alpha[i].repeat(data_repeats[i]) for i in range(0, data_repeats.shape[0].eval())])

    sigma_L = pm.Exponential('sigma_L', 1/expected_within_time_var)

    likelihood = pm.Normal('like', mu=alpha_daily, tau=sigma_L, 
                           observed = data[:,1]) #observed is the vals from the shared data set

#Compute MAP
from scipy import optimize
with test_model:
    MAP = pm.find_MAP(vars = [alpha, sigma_L], fmin = optimize.fmin_bfgs)
    ppc_MAP =  pm.sample_ppc(np.array([MAP]), size = 10000)

##################################################################################################################
##########################################All works OK up to here.################################################
##################################################################################################################
#construct some new data of length 8
steps = [np.random.normal() for i in range(0, 7)]
new_data = [reduce(lambda x, y: x+y, steps[i-2:i]) for i in range(2, len(steps)+2)]
new_data = pd.DataFrame({'nonoise': np.zeros(len(steps)), 'vals': new_data })

#load data into the shared variable
data.set_value(new_data.values)

#generate new predictions using the existing trace (which is actually just a MAP)
with test_model:
    newdata_ppc =  pm.sample_ppc(np.array([MAP]), size = 10000)
    newdata_pred = np.mean(ppc_MAP['like'], axis = 1)#.reshape(len(steps))

newdata_pred.shape # = (1L, 24L) - Not 8 as expected

I think maybe this raises a more general question for me about how the shape argument works. Even in a simple situation where we have a piecewise linear model and wanted to infer the number of breakpoints as a random variable from the data, would this be possible to do without being able to pass a Stochastic to the shape parameter of another RV?

I think pymc3 has amazing potential for creating really flexible models, but I keep encountering issues like this - I guess what Fonnesbeck calls show stoppers? Maybe my Theano knowledge just isn't good enough to see the solutions to this stuff, hopefully someone can give me some advice if there is a workaround?

pymc-devs / pymc2

PPC Predictions for Randomwalk/Defining RV shape at runtime #102