stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
753 stars 188 forks source link

compound functions for logistic regression (and other GLMs) #493

Closed bob-carpenter closed 6 years ago

bob-carpenter commented 7 years ago

Summary:

Add a logistic_regression function with analytic gradients that behaves as if defined (up in the Stan language) by:

real logistic_regression(int[] y, matrix x, vector beta, real alpha) {
  return bernoulli_logit(y | x * beta + alpha);
}

but with analytic gradients. The data will be of this shape:

int y[N];
matrix[N, K] x;

and parameters

real alpha;
vector[K] beta;

But it will need to work for all argument types.

I could be convinced that this should be written as bernoulli_logit_lpdf(y | x, beta) but I don't like that x on the right-hand side because it doesn't feel like a parameter in the model (though I guess it would be if there's missing data or a measurement error model behind the scenes, in which case the savings in this regime is even greater).

Another alternative would be

real bernoulli_logit_lp(vector y, matrix x, vector beta, real alpha) {
  y ~ bernoulli(x * beta + alpha)
}

then you could just write bernoulli_logit_lp(...) in the model and it will automatically drop any unneeded constants (nothing actually here, but could be useful in the other cases). Downside is that it won't be usable outside the model block this way.

Description:

The direct implementation requires

whereas the compound one will require

Recall that each edge corresponds to a virtual function call and a lot of memory management.

Up in Stan, this will let us replace

y ~ bernoulli_logit(x * beta);

with

target += logistic_regression(y, x, beta);

or maybe with

target += logistic_ression_lpmf(y | x, beta);

which would then also support

y ~ logistic_regression(x, beta);

It could even make sense to add the prior in as well, but the data's going to dominate the computation, so it's not such a big deal. That'd look like ridge_logistic_regression(y, x, beta, sigma) where sigma is the prior scale on beta (mean assumed to be zero).

Current Version:

v2.14.0

bgoodri commented 7 years ago

This seems like a promising idea, but I would like to see the constant carved out of x in the signature. In rstanarm, the x matrix has column means of zero and the intercept is separate, which is usually a good thing in terms of specifying a reasonable prior and getting better geometry.

On Mon, Feb 20, 2017 at 3:32 PM, Bob Carpenter notifications@github.com wrote:

Summary:

Add a logistic_regression function with analytic gradients that behaves as if defined (up in the Stan language) by:

real logistic_regression(int[] y, matrix x, vector beta) { return bernoulli_logit(y | x * beta); }

but with analytic gradients. The data will be of this shape:

int y[N]; matrix[N, K] x; vector[K] beta;

I could be convinced that this should be written as logistic_regression_lpdf(y | x, beta) but I don't like that x on the right-hand side because it doesn't feel like a parameter in the model (though I guess it would be if there's missing data or a measurement error model behind the scenes, in which case the savings in this regime is even greater). Description:

The direct implementation requires

  • N + K nodes, N * (K + 1) edges in the expression graph

whereas the compound one will require

  • 1 node, K edges

Recall that each edge corresponds to a virtual function call and a lot of memory management.

Up in Stan, this will let us replace

y ~ bernoulli_logit(x * beta);

with

target += logistic_regression(y, x, beta);

or maybe with

target += logistic_ression_lpmf(y | x, beta);

which would then also support

y ~ logistic_regression(x, beta);

It could even make sense to add the prior in as well, but the data's going to dominate the computation, so it's not such a big deal. That'd look like ridge_logistic_regression(y, x, beta, sigma) where sigma is the prior scale on beta (mean assumed to be zero). Current Version:

v2.14.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/493, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqs2EtTViV4qqKZaLijPETtRbYEz_ks5refhtgaJpZM4MGl81 .

bob-carpenter commented 7 years ago

I'd thought about that, but it adds significant complexity to the function given our need to allow all the instantations. It's definitely do-able, though.

It doesn't actually give you more functionality, because you can always specify a column of 1s in the first column of the data frame and use different priors, e.g.,

beta[1] ~ normal(0, 20);
beta[2: ] ~ normal(0, 1);

but the columns of 1 is itself a bit awkward and inefficient, though in this case, it's all in the matrix computations.

Let's discuss on Thursday at the meeting if we have time. I'm thinking we want to add a bunch of GLMs like this. It should make an enormous difference in run time and memory usage.

P.S. No idea why it doesn't take marksideways from email and now won't take it from here. Grr. Arg.

jgabry commented 7 years ago

Let's discuss on Thursday at the meeting if we have time. I'm thinking we want to add a bunch of GLMs like this. It should make an enormous difference in run time and memory usage.

Sounds good. It would be great to have a number of these.

P.S. No idea why it doesn't take marksideways from email and now won't take it from here. Grr. Arg.

Yup, that's happened to me several times. It's annoying. Also mysterious.

avehtari commented 7 years ago

If the plan is to have a number of these, then could they be named like y ~ bernoulli_logit_regression(x, beta); or some other keyword (Like reg, glm, linear, or lin), as it would be then easy to come up with names for other GLMs like y ~ poisson_log_regression(x, beta);

bob-carpenter commented 7 years ago

Thanks! That makes the GLM connection much clearer.

So should the function be bernoulli_logit_regression_ll() (for "log likelihood") or should we allow it to be used with sampling, so it should be called bernoulli_logit_regression_lpmf().

And yes to poisson_log and related link/sampling distributions.

betanalpha commented 7 years ago

Can we cut “regression” to “regr”? I’d stick with lpmf.

On Feb 21, 2017, at 1:24 PM, Bob Carpenter notifications@github.com wrote:

Thanks! That makes the GLM connection much clearer.

So should the function be bernoulli_logit_regression_ll() (for "log likelihood") or should we allow it to be used with sampling, so it should be called bernoulli_logit_regression_lpmf().

And yes to poisson_log and related link/sampling distributions. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/493#issuecomment-281432017, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlmDy8gEJ9316M9J5bd8JyogC8Wpaks5reyvigaJpZM4MGl81.

avehtari commented 7 years ago

I'd prefer lpmf, and other usual endings For loo we would use log_lik[n] = bernoulli_logit_regr_lpmf(y[n] | x[n], beta); and for censored data (like in survival models) target += bernoulli_logit_regr_lccdf(ycen | xcen, beta);

bob-carpenter commented 7 years ago

I'd prefer lpmf, and other usual endings For loo we would use log_lik[n] = bernoulli_logit_regr_lpmf(y[n] | x[n], beta);

That's a lot more signatures. I was thinking of just

real bernoulli_logit_regress_lpmf(int[] | matrix, vector);

What you're asking for above is

real bernoulli_logit_regress_lpmf(int | row_vector, vector);

But what you wanted before was the split output rather than summed output, so the matrix signature would return vector or real[].

and for censored data (like in survival models) target += bernoulli_logit_regr_lccdf(ycen | xcen, beta);

I'll probably stage the implementations of these.

I also haven't forogtten about versions of the probability functions that return vectors for each of the input elements.

avehtari commented 7 years ago

I'm fine with less signatures. As the generated quantities block doesn't affect the expression tree, it's not a problem for me to use log_lik[n] = bernoulli_logit_lpmf(y[n] | x[n]*beta);

bob-carpenter commented 7 years ago

Great---that one already exists. What we need to do is give you vectorized versions of these as you've been asking for for ages (I haven't forgotten---we also need the propto versions to use with target += to complete everything on the table so far).

VMatthijs commented 7 years ago

Hi all,

I'm currently having a stab at fixing this. I am running into the issue though of dealing with derivatives w.r.t. the matrix argument x. Is this something you've tried before? Is operands and partials set up to deal with matrix arguments?

In particular, I am trying to do the following, which raises an error:

opspartials.edge1.partials_[n][n] += theta_derivative * beta_vec; .

(here, argument 1 is a matrix x)

Is there a simple fix that you see, or do I need to abandon operands_and_partials altogether?

Thanks!

Matthijs

bob-carpenter commented 7 years ago

You can look at some of the functions in rev/mat to see how we deal with memory allocation for vectors. The idea is that you want to allocate memory using memalloc_.allocate_array() (I think I got that right), then wrap them in Eigen::Map to use as matrices. That way, the memory stays in our arena. This is described (albeit tersely) in the arXiv paper. Please do not use matrix objects in your vari implementations (it's possible with the third stack of things that do get their destructors called, but I'm trying to get rid of that, but got stuck at the LDLT usage).

I should've sent you my simple implementation for one aspect of this issue, which just uses stored_gradient_vari. This only handles one signature, and we need to allow the two continuous parameters to be instantiated with double or var scalars (and fvar for forward mode, but those can just use templated versions for double and rely on autodiff internally).

    var logistic_regression_lpmf(const vector<int>& y,
                                 const Eigen::Matrix<double, -1, -1>& x,
                                 const Eigen::Matrix<var, -1, 1>& beta) {
      assert(x.cols() == beta.rows());
      assert(x.rows() == y.size());
      int N = x.rows();
      int K = x.cols();
      VectorXd beta_d(K);
      for (int k = 0; k < K; ++k) beta_d(k) = beta(k).val();
      VectorXd logit_y_hat = x * beta_d;
      VectorXd y_hat = inv_logit(logit_y_hat);
      double val = 0;
      VectorXd partials = VectorXd::Zeros(K);
      for (int n = 0; n < N; ++n) {
        val += y[n] ? log(y_hat) : log1m(y_hat);
        partials += x.row(n) * (y[n] - y_hat[n]);
      }
      vari** operands = memalloc_.alloc_array<vari*>(K);
      double* parts = memalloc_.alloc_array<double>(K);
      for (int k = 0; k < K; ++k) {
        operands[k] = beta(k).vi_;
        parts[k] = partials[k];
      }
      return var(new stored_gradient_vari(val, K, operands, parts));
    }
  }

I think it should be called bernoulli_logit_glm --- following the likelihood and link function terminology of GLMs.

It could be made quite a bit faster if the y values could be sorted into 0 and 1 outcomes and handled separately---it eliminates tests and opens up a route to more vectorization. Because the y are integers, there's no good way to vectorize partials += x.row(n) * (y[n] - y_hat[n]). It's all just double values at this point, so probably not worth worrying too much about.

It'd also be nice if there was a function to pull out the the vector of vari from a vector of var.

And the final copy to parts from partials is redundant. We can probably just work on the parts all along using Eigen::Map<VectorXd> constructed over parts.

seantalts commented 7 years ago

@bob-carpenter Is there a reason you're avoiding operands_and_partials here? It should support matrices.

Separately, is there a reason you're hardcoding the types in the function signature? Wouldn't we normally template them and try to accept a variety?

seantalts commented 7 years ago

I think I don't fully understand the comments about making sure to allocate memory manually and using Eigen::Map on top... Might need to think about this further, sorry.

bob-carpenter commented 7 years ago

@seantalts: I wasn't avoiding it so much as it didn't exist in that form when I wrote that example. If you can do it more easily with operands_and_partials, then by all means do it. There's no extra overhead.

@seantalts: You can't have member variables in vari implementations that allocate memory on the heap, such as std::vector or Eigen::Matrix, because the destructors aren't called. I went over this in the arXiv paper on autodiff, I think.

VMatthijs commented 7 years ago

My attempt at a solution can be found here: https://github.com/VMatthijs/math/tree/feature/493-bernoulli_logic_regress_lpmf .

My tests are throws an error, however, which seems to be caused by the line ops_partials.edge1_.partials_[n] += theta_derivative * beta_dbl; . The idea here is that argument 1 is an NxM matrix x. Therefore, I would expect opspartials.edge1.partials_[n] to a length M vector. The right hand side is a length M vector as well. However, this addition throws an error. From some experimentation, I get the feeling that the left hand side is a number. Why is it that operands and partials returns a vector if we take the derivative with respect to a matrix? Any suggestions on how to proceed from here?

bob-carpenter commented 7 years ago

@VMatthijs: If x is a matrix, then x[m] is a row_vector (for the m-th row), not a (column) vector.

It helps to actually see the error rather than an interpretation of it.

VMatthijs commented 7 years ago

Thanks, @bob-carpenter ! I wonder if this distinction between row and column vector is relevant here. :-) I hadn't realised that was what happened if you indexed an Eigen Matrix.

For completeness, I'm including the errors thrown by my tests below.

In file included from test/unit/math/prim/mat/prob/bernoulli_logit_glm_lpmf_test.cpp:1:
In file included from ./stan/math/prim/mat.hpp:298:
In file included from ./stan/math/prim/arr.hpp:40:
In file included from ./stan/math/prim/scal.hpp:181:
./stan/math/prim/scal/prob/bernoulli_logit_glm_lpmf.hpp:131:40: error: invalid operands to binary expression ('double' and 'typename internal::enable_if<true, const
      CwiseBinaryOp<internal::scalar_product_op<typename internal::promote_scalar_arg<Scalar, double, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<double,
      Scalar, Eigen::internal::scalar_product_op<double, Scalar> > >::value)>::type, typename internal::traits<Matrix<double, -1, 1, 0, -1, 1> >::Scalar>, const typename
      internal::plain_constant_type<Matrix<double, -1, 1, 0, -1, 1>, typename internal::promote_scalar_arg<Scalar, double,
      (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<double, Scalar, Eigen::internal::scalar_product_op<double, Scalar> > >::value)>::type>::type, const
      Matrix<double, -1, 1, 0, -1, 1> > >::type' (aka 'const Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double, double>, const
      Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, 1, 0, -1, 1> >, const Eigen::Matrix<double, -1, 1, 0, -1, 1> >'))
                    ops_partials.edge1_.partials_[n] +=   theta_derivative * beta_dbl; // for some reason the LHS is a number, rather than a vector... this is very strange
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/unit/math/prim/mat/prob/bernoulli_logit_glm_lpmf_test.cpp:23:32: note: in instantiation of function template specialization 'stan::math::bernoulli_logit_glm_lpmf<true,
      Eigen::Matrix<int, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1> >'
      requested here
                  (stan::math::bernoulli_logit_glm_lpmf<true>(n, x, beta, alpha)));
                               ^
lib/gtest_1.7.0/include/gtest/gtest.h:2084:33: note: expanded from macro 'EXPECT_FLOAT_EQ'
                      expected, actual)
                                ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:162:40: note: expanded from macro 'EXPECT_PRED_FORMAT2'
  GTEST_PRED_FORMAT2_(pred_format, v1, v2, GTEST_NONFATAL_FAILURE_)
                                       ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:147:43: note: expanded from macro 'GTEST_PRED_FORMAT2_'
  GTEST_ASSERT_(pred_format(#v1, #v2, v1, v2), \
                                          ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:77:52: note: expanded from macro 'GTEST_ASSERT_'
  if (const ::testing::AssertionResult gtest_ar = (expression)) \
                                                   ^
In file included from test/unit/math/prim/mat/prob/bernoulli_logit_glm_lpmf_test.cpp:1:
In file included from ./stan/math/prim/mat.hpp:298:
In file included from ./stan/math/prim/arr.hpp:40:
In file included from ./stan/math/prim/scal.hpp:181:
./stan/math/prim/scal/prob/bernoulli_logit_glm_lpmf.hpp:131:40: error: invalid operands to binary expression ('double' and 'typename internal::enable_if<true, const
      CwiseBinaryOp<internal::scalar_product_op<typename internal::promote_scalar_arg<Scalar, double, (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<double,
      Scalar, Eigen::internal::scalar_product_op<double, Scalar> > >::value)>::type, typename internal::traits<Matrix<double, -1, 1, 0, -1, 1> >::Scalar>, const typename
      internal::plain_constant_type<Matrix<double, -1, 1, 0, -1, 1>, typename internal::promote_scalar_arg<Scalar, double,
      (Eigen::internal::has_ReturnType<Eigen::ScalarBinaryOpTraits<double, Scalar, Eigen::internal::scalar_product_op<double, Scalar> > >::value)>::type>::type, const
      Matrix<double, -1, 1, 0, -1, 1> > >::type' (aka 'const Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double, double>, const
      Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, 1, 0, -1, 1> >, const Eigen::Matrix<double, -1, 1, 0, -1, 1> >'))
                    ops_partials.edge1_.partials_[n] +=   theta_derivative * beta_dbl; // for some reason the LHS is a number, rather than a vector... this is very strange
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/unit/math/prim/mat/prob/bernoulli_logit_glm_lpmf_test.cpp:25:32: note: in instantiation of function template specialization
      'stan::math::bernoulli_logit_glm_lpmf<false, Eigen::Matrix<int, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>,
      Eigen::Matrix<double, -1, 1, 0, -1, 1> >' requested here
                  (stan::math::bernoulli_logit_glm_lpmf<false>(n, x, beta, alpha)));
                               ^
lib/gtest_1.7.0/include/gtest/gtest.h:2084:33: note: expanded from macro 'EXPECT_FLOAT_EQ'
                      expected, actual)
                                ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:162:40: note: expanded from macro 'EXPECT_PRED_FORMAT2'
  GTEST_PRED_FORMAT2_(pred_format, v1, v2, GTEST_NONFATAL_FAILURE_)
                                       ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:147:43: note: expanded from macro 'GTEST_PRED_FORMAT2_'
  GTEST_ASSERT_(pred_format(#v1, #v2, v1, v2), \
                                          ^
lib/gtest_1.7.0/include/gtest/gtest_pred_impl.h:77:52: note: expanded from macro 'GTEST_ASSERT_'
  if (const ::testing::AssertionResult gtest_ar = (expression)) \
                                                   ^
2 errors generated.

EDIT: I checked, and transposing beta does not solve the problem.

bob-carpenter commented 7 years ago

On Aug 24, 2017, at 10:42 PM, VMatthijs notifications@github.com wrote:

Thanks, @bob-carpenter ! I wonder if this distinction between row and column vector is relevant here. :-) I hadn't realised that was what happened if you indexed an Eigen Matrix.

I don't think you can just index an Eigen::Matrix with one index and get a row vector---that's a Stan thing.

bob-carpenter commented 7 years ago

I can't follow what's going on here. There's some very complex type involving an enable_if and a double. Looks like ops_partials.edge1_.partials_[n] is a scalar and you weren't expecting that. Hopefully @seantalts can help you figure this one out.

seantalts commented 7 years ago

opspartials.edge1.partials_ is an Eigen vector, because the first arg to ops_partials contructor is x, defined as Matrix<double,Dynamic,1> x(3,2);. A matrix would be Matrix<double, Dynamic, Dynamic x(3,2);.

Also, you can't use [] to index into a matrix - you have to use ().

See this for how to assign to a row or column.

VMatthijs commented 7 years ago

Thanks for spotting the typo, @seantalts ! Still, it doesn't fix things. I think I may have found out what is going on, though. I suspect that opspartials.edge1.partials_ is a vector even when argument 1 is a matrix, where we traverse the derivatives with respect to the matrix elements left-to-right-top-to-bottom. Could that be? I need to leave for dinner now, but will continue testing this later.

By the way, I am wondering what would be the most efficient way of me getting familiar with the code base. I also don't constantly want to be asking your time. Do you have any suggestions on how to get more familiar with it myself?

seantalts commented 7 years ago

Can you post the new error and code? When I fixed that on my machine, I got new problems relating to the stuff I sent you in the email before, this section:

  • scalar_seq_view doesn't support any operations besides [] right now. This means writing linear algebra with it is kind of fucked. I'm not sure what to do about that - for now I'd either write it with for loops or maybe write some overloaded functions that deal with scalar vs eigen types. Or what you might really want is to coerce into an Eigen type (or leave it as a scalar) manually - something that works with Eigen's operator overloads.

I think diving into a problem and struggling through it (with hearty googling and eigen docs and questions to me) is the fastest way that I know of.

seantalts commented 7 years ago

JTBC - Don't feel bad about asking me questions! I love helping people, and I think it's the fastest way to get people up to speed so I'm more than happy to help out. Keep 'em coming :)

VMatthijs commented 7 years ago

Thanks, @seantalts ! I'm currently debugging my code using the very high-tech method of print-statements. Haha. Will definitely ask you when I have a better idea of where the error originates.

bob-carpenter commented 7 years ago

+1 to Sean's comments on our being willing to help.

If you want to collect material as you go that you think other people would find helpful, that'd be great. But we don't have an overall code tour anywhere.

bob-carpenter commented 7 years ago

On Aug 25, 2017, at 4:10 PM, VMatthijs notifications@github.com wrote:

Thanks, @seantalts ! I'm currently debugging my code using the very high-tech method of print-statements.

In roughly 35 years of coding, I've never found a better way to debug than careful instrumentation with print statements. Your mileage may vary depending on how good you are with stepwise debuggers---I've usually found them more trouble than they're worth and have only pulled one out a few times when I couldn't solve a problem any other way.

VMatthijs commented 7 years ago

So, my code is now passing the tests I wrote, which check that it returns the answer as the existing logistic regression. However, the tests are only for constant values (doubles) for the arguments and not for the situation where the arguments are parameters. In fact, I am sure that my code for the derivative wrt the data matrix x is still wrong. I'm really struggling to debug that. From my current testing, the derivative wrt x is a 1-dimensional vector, while I'd expect it to be a matrix. Still, my best hope is that it's a vectorised version of the matrix of derivatives that I'd expect it to be (i.e. traversing the matrix elements left-to-right-top-to-bottom). Would that be possible? Any suggestions on how I could proceed from this point?

EDIT: Also, @bob-carpenter , good to hear my method isn't entirely crazy. When I was writing C# at Microsoft, I did it in Visual Studio, which actually has a fantastic debugger which made life so much easier. Generally, I hate to admit it, but writing C# in VS is a lovely experience. Best Microsoft product I've ever used. I still need to set up my environment for the Stan code a bit better. Now it's Notepad++ and bash. Haha.

VMatthijs commented 7 years ago

Hi @bob-carpenter and @seantalts , I've now tested my function and it seems to work fine as long as the data matrix x is not a parameter. From browsing through the operands_and_partials source code, I get the feeling that it is simply not set up to deal with matrix arguments. Any suggestions on how to proceed? My feeling is that it would be good to have some flexible reasonably high-level primitives for writing this sort of code, like operands_and_partials. I'm just not sure if I'm well-versed enough in your code yet to generalise operands_and_partials to matrix operands, myself.

seantalts commented 7 years ago

What makes you feel that way? It has specializations for matrices in the /mat/ directories - here is a test for reverse mode with a matrix operand and a matrix partials_ member: https://github.com/stan-dev/math/blob/develop/test/unit/math/rev/mat/meta/operands_and_partials_test.cpp#L104

VMatthijs commented 7 years ago

I think that's for computing the Jacobian of a vector valued function with vector arguments, rather than a scalar valued function with matrix arguments. Or am I confused?

One thing that makes me think that is section 15.1 in the arxiv paper on the Stan math library (autodiff) as well as the comment

* This class builds partial derivatives with respect to a set of
 * operands. There are two reason for the generality of this
 * class. The first is to handle vector and scalar arguments
 * without needing to write additional code. The second is to use
 * this class for writing probability distributions that handle
 * primitives, reverse mode, and forward mode variables
 * seamlessly.

in https://github.com/stan-dev/math/blob/develop/stan/math/rev/scal/meta/operands_and_partials.hpp .

EDIT: (Also, in my testing, the member partialsvec does not exist. The member partials_ does and the compiler thinks it's an array of doubles, but it complains once I try to access any element in the array. It throws the error: stan/lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/DenseCoeffsBase.h:407:27: error: no member named 'THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD' in 'Eigen::internal::static_assertion<false>' THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD) ^ stan/lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/util/StaticAssert.h:123:78: note: expanded from macro 'EIGEN_STATIC_ASSERT' if (Eigen::internal::static_assertion<static_cast<bool>(CONDITION)>::MSG) {} ^ stan/lib/stan_math/stan/math/prim/mat/prob/bernoulli_logit_glm_lpmf.hpp:128:7: note: in instantiation of member function 'Eigen::DenseCoeffsBase<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1>::operator[]' requested here ops_partials.edge1_.partials_[0] += 1;// theta_derivative * beta_dbl.transpose(); // for som... ^ examples/bernoulli_logit_glm/bernoulli_logit_glm_test.hpp:245:28: note: in instantiation of function template specialization 'stan::math::bernoulli_logit_glm_lpmf<true, std::vector<int, std::allocator<int> >, Eigen::Matrix<stan::math::var, -1, -1, 0, -1, -1>, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1> >' requested here lp_accum__.add(bernoulli_logit_glm_lpmf<propto__>(y, x, beta, alpha)); ^ stan/src/stan/model/log_prob_grad.hpp:44:28: note: in instantiation of function template specialization 'bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model::log_prob<true, true, stan::math::var>' requested here = model.template log_prob<propto, jacobian_adjust_transform> ^ stan/src/stan/services/util/initialize.hpp:147:37: note: in instantiation of function template specialization 'stan::model::log_prob_grad<true, true, bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>' requested here log_prob = stan::model::log_prob_grad<true, true> ^ stan/src/stan/services/diagnose/diagnose.hpp:54:19: note: in instantiation of function template specialization 'stan::services::util::initialize<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model, boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> > >' requested here = util::initialize(model, init, rng, init_radius, ^ src/cmdstan/command.hpp:135:49: note: in instantiation of function template specialization 'stan::services::diagnose::diagnose<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>' requested here return_code = stan::services::diagnose::diagnose(model, ^ src/cmdstan/main.cpp:8:21: note: in instantiation of function template specialization 'cmdstan::command<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>' requested here return cmdstan::command<stan_model>(argc,argv); )

seantalts commented 7 years ago

Both of those are old - since they were published I've updated operands_and_partials to support matrix operands, though the tests are fairly basic so let me know if you're having trouble with it (with some code I can run to see the problem if you don't mind). operands_and_partials is most often used in the case where the function is scalar-valued - e.g. all of the existing distributions. But none of the existing distributions that I'm aware of allow for matrix operands, so this will be the first time we test that feature of operands_and_partials.

(Also we should update the docstring - that part I took verbatim from the old operands_and_partials and was not yet sure I'd figure out how to make matrices work in the new framework. Forgot it specified there what types it would work with)

VMatthijs commented 7 years ago

Oh, that's very exciting! Could you perhaps give me some suggested syntax on what to replace this line of code with? ops_partials.edge1_.partials_[n] += theta_derivative * beta_dbl.transpose(); The intended behaviour is that argument 1 is a Eigen(Dynamic,Dynamic) matrix (say NxM). Therefore, I would expect ops_partials.edge1_.partials_ to be an NxM matrix as well. Now, I would really like to add the length M vector represented by the right hand side to the n-th row of this matrix of derivatives. Any idea what the syntax should be?

seantalts commented 7 years ago

I sent this above: "See this for how to assign to a row or column." as an attempt to answer that question :P

You can't use [n] to index into matrices - you'd need (n) to get out the nth element, but to access the nth row you'd need the syntax at the link: ops_partials.edge1_.partials_.row(n) += theta_derivative * beta_dbl.transpose();

VMatthijs commented 7 years ago

Ah, right. I did try this a while back but throws an error: ` stan/lib/stan_math/stan/math/prim/mat/prob/bernoulli_logit_glm_lpmf.hpp:128:37: error: no member named 'row' in 'stan::math::internal::empty_broadcast_array' opspartials.edge1.partials_.row(n) += theta_derivative * beta_dbl.transpose(); // for som...


examples/bernoulli_logit_glm/bernoulli_logit_glm_test.hpp:245:28: note: in instantiation of function template
      specialization 'stan::math::bernoulli_logit_glm_lpmf<false, std::vector<int, std::allocator<int> >,
      Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0,
      -1, 1> >' requested here
            lp_accum__.add(bernoulli_logit_glm_lpmf<propto__>(y, x, beta, alpha));
                           ^
stan/src/stan/services/util/initialize.hpp:114:39: note: in instantiation of function template specialization
      'bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model::log_prob<false, true, double>'
      requested here
            log_prob = model.template log_prob<false, true>
                                      ^
stan/src/stan/services/diagnose/diagnose.hpp:54:19: note: in instantiation of function template specialization
      'stan::services::util::initialize<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model,
      boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0,
      2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> > >' requested here
          = util::initialize(model, init, rng, init_radius,
                  ^
src/cmdstan/command.hpp:135:49: note: in instantiation of function template specialization
      'stan::services::diagnose::diagnose<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>'
      requested here
        return_code = stan::services::diagnose::diagnose(model,
                                                ^
src/cmdstan/main.cpp:8:21: note: in instantiation of function template specialization
      'cmdstan::command<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>' requested here
    return cmdstan::command<stan_model>(argc,argv);
                    ^
In file included from <built-in>:318:
In file included from <command line>:7:
examples/bernoulli_logit_glm/bernoulli_logit_glm_test.hpp:245:28: error: no matching function for call to
      'bernoulli_logit_glm_lpmf'
            lp_accum__.add(bernoulli_logit_glm_lpmf<propto__>(y, x, beta, alpha));
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stan/src/stan/services/optimize/newton.hpp:61:31: note: in instantiation of function template specialization
      'bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model::log_prob<false, false, double>'
      requested here
          lp = model.template log_prob<false, false>(cont_vector, disc_vector,
                              ^
src/cmdstan/command.hpp:151:49: note: in instantiation of function template specialization
      'stan::services::optimize::newton<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>'
      requested here
        return_code = stan::services::optimize::newton(model,
                                                ^
src/cmdstan/main.cpp:8:21: note: in instantiation of function template specialization
      'cmdstan::command<bernoulli_logit_glm_test_model_namespace::bernoulli_logit_glm_test_model>' requested here
    return cmdstan::command<stan_model>(argc,argv);
                    ^
stan/lib/stan_math/stan/math/prim/mat/prob/bernoulli_logit_glm_lpmf.hpp:47:5: note: candidate template ignored:
      substitution failure [with propto = false, T_n = std::vector<int, std::allocator<int> >, T_x =
      Eigen::Matrix<double, -1, -1, 0, -1, -1>, T_beta = Eigen::Matrix<double, -1, 1, 0, -1, 1>, T_alpha =
      Eigen::Matrix<double, -1, 1, 0, -1, 1>]
    bernoulli_logit_glm_lpmf(const T_n& n, const T_x& x, const T_beta& beta, const T_alpha& alpha) {
    ^
stan/lib/stan_math/stan/math/prim/mat/prob/bernoulli_logit_glm_lpmf.hpp:153:5: note: candidate template ignored: invalid
      explicitly-specified argument for template parameter 'T_n'
    bernoulli_logit_glm_lpmf(const T_n& n,
    ^
2 errors generated.
`
seantalts commented 7 years ago

I think from an organizational perspective we should switch this conversation to a conversation on a pull request from your branch, probably? Can you make one and just put [WIP] in front of the title?

Meanwhile, the error is showcasing a tricky thing - even in cases where there are no partials to keep track of, the code needs to compile as if it would (i.e. you would need to create a stub method row() on the empty_broadcast_array). But in this case I would actually say you shouldn't allow the first operand to be anything besides an Eigen type. This constraint would mostly take the form of some mention of it in the doc string and then curation of the types presented in the Stan language (where we add function signatures for each supported type individually in that file we looked at the other day). And then if someone tried to use it from just the math library in this way (perhaps because they didn't read the doc string), they would end up with the error message you're pasting above.