pm.sample_prior_predictive() fails with TypeError

djinnome commented 5 years ago

Hey Peter,

So I got the small model working with your help, thank you so much! Now I am working on glycolysis-TCA-itaconate model, and everything runs just fine: with ADVI, I get a nice ELBO,

yarrowia_glycolysis_itaconate_elbo

the parity between predicted and measured fluxes and enzymes is tight, sample_posterior_predictive generates reasonable-looking FCC's, but I can't get pm.sample_prior_predictive() to run.

The error message is:

TypeError: Bad input argument to theano function with name "/Users/zuck016/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/distributions/distribution.py:432" at index 3 (0-based). Wrong number of dimensions: expected 1, got 0 with shape ().

This is my model:

with pm.Model() as pymc_model:

    # Initialize elasticities
    Ex_t = pm.Deterministic(
        'Ex', initialize_elasticity(N.values, 'ex', b=0.05, sd=1, alpha=None,
                                    m_compartments=m_compartments,
                                    r_compartments=r_compartments))

    Ey_t = pm.Deterministic('Ey', initialize_elasticity(-Ey.T.values, 'ey', b=0.05, sd=1, alpha=None))

    e_measured = theano.shared( np.ones( enzyme_measurements.shape ) )
    for i, strain in enumerate(enzyme_measurements.index):
        for j, enzyme in enumerate(enzyme_measurements.columns):
            if enzyme_measurements.loc[reference_state, enzyme] == 1 and enzyme_measurements.loc[strain,enzyme] != 1:
                e_measured = T.set_subtensor( e_measured[i,j], 
                                              pm.Uniform( 'e_{}_{}'.format( strain, enzyme ),
                                                          lower = 0, upper=1) )
            elif enzyme_measurements.loc[reference_state,enzyme] != 1 and enzyme_measurements.loc[strain, enzyme] == 1:
                e_measured = T.set_subtensor( e_measured[i,j], 
                                              pm.Uniform( 'e_{}_{}'.format( strain, enzyme ),
                                                          lower = 1, upper=10) )

    e_unmeasured =  T.ones((n_exp, len(e_unmeasured_inds)))
    e_uncatalyzed = T.ones((n_exp, len(e_uncatalyzed_inds)))

    en_t = T.concatenate(
        [e_measured, e_unmeasured,
         e_uncatalyzed], axis=1)[:, e_indexer]

    pm.Deterministic('en_t', en_t)

    yn_t = T.as_tensor_variable(yn.values)

    chi_ss, vn_ss = ll.steady_state_theano(Ex_t, Ey_t, en_t, yn_t)
    pm.Deterministic('chi_ss', chi_ss)
    pm.Deterministic('vn_ss', vn_ss)

    vn_subset = T.clip(vn_ss[:, v_inds], 0, 2)

    chi_clip = T.clip(chi_ss[:, x_inds], -3, 3)  

    #chi_obs = pm.Normal('chi_obs', mu=chi_clip, sd=0.2,
    #                    observed=xn.values)
    vn_obs = pm.Normal('vn_obs', mu=vn_subset, sd=0.025,
                           observed=vn.values)

And this is the full stacktrace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-3dcba3ffb078> in <module>
      1 with pymc_model:
      2     varnames = sorted(pymc_model.named_vars.keys())
----> 3     trace_prior = pm.sample_prior_predictive(vars = [v for v in varnames[len(varnames):]])

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/sampling.py in sample_prior_predictive(samples, model, vars, random_seed)
   1324     # draw_values fails with auto-transformed variables. transform them later!
   1325     print(names)
-> 1326     values = draw_values([model[name] for name in names], size=samples)
   1327 
   1328     data = {k: v for k, v in zip(names, values)}

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    398                                             point=point,
    399                                             givens=givens.values(),
--> 400                                             size=size)
    401                         evaluated[param_idx] = drawn[param] = value
    402                         givens[param.name] = (param, value)

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    507                 not all(var.dshape == getattr(val, 'shape', tuple())
    508                         for var, val in zip(variables, values))):
--> 509                 output = np.array([func(*v) for v in zip(*values)])
    510             elif (size is not None and any((val.ndim > var.ndim)
    511                   for var, val in zip(variables, values))):

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/distributions/distribution.py in <listcomp>(.0)
    507                 not all(var.dshape == getattr(val, 'shape', tuple())
    508                         for var, val in zip(variables, values))):
--> 509                 output = np.array([func(*v) for v in zip(*values)])
    510             elif (size is not None and any((val.ndim > var.ndim)
    511                   for var, val in zip(variables, values))):

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    811                         s.storage[0] = s.type.filter(
    812                             arg, strict=s.strict,
--> 813                             allow_downcast=s.allow_downcast)
    814 
    815                     except Exception as e:

~/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/theano/tensor/type.py in filter(self, data, strict, allow_downcast)
    176             raise TypeError("Wrong number of dimensions: expected %s,"
    177                             " got %s with shape %s." % (self.ndim, data.ndim,
--> 178                                                         data.shape))
    179         if not data.flags.aligned:
    180             try:

TypeError: Bad input argument to theano function with name "/Users/zuck016/.pyenv/versions/miniconda3-latest/envs/idp_new/lib/python3.6/site-packages/pymc3/distributions/distribution.py:432" at index 3 (0-based). Wrong number of dimensions: expected 1, got 0 with shape ().

Any suggestions as to what I might be doing wrong?

pstjohn commented 5 years ago

It's got to be in here somewhere :) My guess is that sample_prior_predictive expects 1d variables somewhere, but the e matrix is built up out of 0d Uniform distributions? It's tough to say.

You could try the quick hack (note the shape=(1,) kwargs)

    e_measured = theano.shared( np.ones( enzyme_measurements.shape ) )
    for i, strain in enumerate(enzyme_measurements.index):
        for j, enzyme in enumerate(enzyme_measurements.columns):
            if enzyme_measurements.loc[reference_state, enzyme] == 1 and enzyme_measurements.loc[strain,enzyme] != 1:
                e_measured = T.set_subtensor( e_measured[i,j], 
                                              pm.Uniform( 'e_{}_{}'.format( strain, enzyme ),
                                                          lower = 0, upper=1, shape=(1,)) )
            elif enzyme_measurements.loc[reference_state,enzyme] != 1 and enzyme_measurements.loc[strain, enzyme] == 1:
                e_measured = T.set_subtensor( e_measured[i,j], 
                                              pm.Uniform( 'e_{}_{}'.format( strain, enzyme ),
                                                          lower = 1, upper=10, shape=(1,)) )

    e_unmeasured =  T.ones((n_exp, len(e_unmeasured_inds)))
    e_uncatalyzed = T.ones((n_exp, len(e_uncatalyzed_inds)))

    en_t = T.concatenate(
        [e_measured, e_unmeasured,
         e_uncatalyzed], axis=1)[:, e_indexer]

    pm.Deterministic('en_t', en_t)

the e matrix is supposed to be (n_exp, nr) in shape. (number of experiments by number of reactions). Did you try a closer version to my contador example?

I would do something like this:

    e_hat = T.ones((n_exp, nr))
    e_hat_overexpressed = pm.Uniform('e_hat_entries', lower=0, upper=1, shape=(num_overexpressed_enzymes,))
    e_hat_underexpressed = pm.Uniform('e_hat_entries', lower=1, upper=10, shape=(num_underexpressed_enzymes,))

    for i, strain in enumerate(enzyme_measurements.index):
        for j, enzyme in enumerate(enzyme_measurements.columns):
            if enzyme_measurements.loc[reference_state, enzyme] == 1 and enzyme_measurements.loc[strain,enzyme] != 1:
                T.set_subtensor( e_hat[i,j], e_hat_underexpressed[j] )
            elif enzyme_measurements.loc[reference_state,enzyme] != 1 and enzyme_measurements.loc[strain, enzyme] == 1:
                T.set_subtensor( e_hat[i,j], e_hat_overexpressed[j] )

    # Store the fitted e_hat in the trace object
    e_hat = pm.Deterministic('e_hat', e_hat)

(this code probably isn't correct, but hopefully is close enough to convey the idea) this would then assume that enzymes overexpressed relative to WT are overexpressed by a constant amount in all the strains, which is fewer degrees of freedom. You could certainly add those back in if you needed too, but I'd start with the more constrained example.

The main difference would be specifying all the Uniforms in a single vector, rather than initializing them separately.

djinnome commented 4 years ago

Actually, the problem appeared to be that I had not set the Ey matrix correctly. I still have some semantic confusion about which reactions/metabolites belong in Ex and which belong in Ey, which don't belong in either. Is there an intuitive explanation you can provide to guide my modeling decisions?

pstjohn commented 4 years ago

So all reactions have to "be" in both: Ex is (nxm) (where n is number of reactions, m is number of internal metabolites) and Ey is (nxp) (where p is number of external metabolites).

But then it comes down to whether a metabolite is "internal" or "external". These distinctions are similar to the ones from MCA, but it comes down to whether metabolite concentrations are something that can be independently controlled (i.e., media concentrations) or something that's the result of steady-state fluxes and rate rules. They're essentially the boundary conditions for the overall pathway.

The Ey matrix (and external metabolites) gives you a way to change flux through the model without changes in enzyme expression. But if the media is consistent across all the experiments, you don't really need to include external metabolites, since they'll just be zero (log y/y*) for all conditions.

pstjohn / emll

pm.sample_prior_predictive() fails with TypeError #4