Meta Issue for Static Matrices

SteveBronder commented 4 years ago

Description

This is a meta issue to list out the steps needed for static matrices (list is in no particular order). If I'm missing anything please make a comment and I'll update

[x] A. Code reviews for the base implementation (I think it could be made a lot better)
[ ] B. Write a script that replaces vari and var with templated versions in the math library.
- I think a clever R or Python script could read through the library and add those templates as necessary.
[ ] C. Write out the derivatives for the operators in core for mixes of matrices, vectors, scalars, etc.
- This is just going through Giles and using matrixcalculus.org to template out matrix specializations
[ ] D. Figure out what stuff can be reused in rev
- I'm not totally sure how much can be reused. Things like rev inverse etc probably need specializations, but it looks like a lot of the unary functions can just reuse the same code with some templating. I'd like to support everything that a matrix can currently do and I need some help from people smarter than I to figure out what can be reused with some clever refactoring.
[ ] E. Have distributions take in var types with inner Eigen types
- Writing out specializations for operands_and_partials, scalar_seq_view, and VectorBuilder.
- This can only be done once the underlying functions distributions depend on accept static matrices
- We will also need to add static matrices to the distribution test suite
[ ] F. Refactor the AD test suite to work on static matrices - There's a lot of places in the AD test suite that assumes the var's inner type is double. The steps for this could be to implement the derivatives of the operators first, then once those are in refactor the AD test suite / add a specialization for matrix var types.

Current Version:

v3.2.0

bbbales2 commented 4 years ago

C. Write a script that replaces vari and var with templated versions in the math library. D. Figure out what stuff can be reused in rev

I think this is where my head is at on this stuff. I don't think I'll know how to judge progress until we've identified all the things that can be converted. Which is a matter of going through rev/fun. Maybe I'll do it later today, but if someone else beats me to it, I'm not complaining.

F. Refactor the AD test suite to work on static matrices

I think we could just go through each argument and copy all the var<Eigen::Matrix<double, R, C>> s to Eigen::Matrix<var, R, C> s and that would work as a first hack.

I don't know how this works with higher order autodiff. I guess the same way as anything does, so I guess we just gotta try it and see what blows up.

bbbales2 commented 4 years ago

So I went through and looked through the functions in prim and rev (manually, so error prone in that way).

So the status of this is we need any function that took a matrix before to possibly take a var_type<MatrixXd> thing now, right? And that even effects things in prim, because we expect all that stuff to be autodiffable?

Is there any lazy fallback we can or should do for any of this? Like is there a way to add a constructor to Eigen::Matrix<var, -1, -1> so it knows how to cast from var_type<Eigen::MatrixXd>?

Here are the lists of functions:

There are functions in rev that should have new versions for the new autodiff types (here)
There are the apply_scalar_unary and apply_vector_unary functions that should have new versions for the new autodiff types (here)
There are tons of functions in prim that just need verified for compatibility (here)

I was thinking though, maybe since we want our whole matrix types to work everywhere a matrix type works, including mixes of whole matrix s and matrix s, it would be good to start with the test framework.

So now if the test framework gets an Eigen::MatrixXd argument, it will call the function with Eigen::MatrixXd and Eigen::Matrix<var, -1, -1> arguments. It should also call it with var_type<Eigen::MatrixXd> arguments now.

Similarly, when fvar<Eigen::Matrix<var, -1, -1>> gets tested, so too should fvar<var<Eigen::MatrixXd>>.

If a function gets a whole matrix, should it return a whole matrix or just a regular matrix?

For the case of a binary function, we could do:

f(Eigen::MatrixXd, var<Eigen::MatrixXd>) -> var_type<Eigen::MatrixXd>

or

f(Eigen::MatrixXd, var<Eigen::MatrixXd>) -> Eigen::MatrixXd<var, -1, -1>

We should do the first, right?

To make the mixes of Eigen::MatrixXd, Eigen::Matrix<var, -1, -1>, and var_type<Eigen::MatrixXd> feasible, I think we should polish off adj_jac_apply and update it for the new stuff. Presumably it can be programmed more efficiently now.

I didn't go through the distributions yet.

SteveBronder commented 4 years ago

Thanks for looking through all those!

So the status of this is we need any function that took a matrix before to possibly take a var_type thing now, right? And that even effects things in prim, because we expect all that stuff to be autodiffable?

Yeah we'd want everything that can work with a matrix type rn

Is there any lazy fallback we can or should do for any of this? Like is there a way to add a constructor to Eigen::Matrix<var, -1, -1> so it knows how to cast from var_type?

Yes I'm pretty sure we can add a conversion operator to var_type<Eigen::MatrixXd> so we can go from var_type<Eigen::MatrixXd> -> Eigen::Matrix<var, -1, -1> and another conversion operator for var_type<Eigen::MatrixXd> -> Eigen::Matrix<var, -1, -1> using the custom eigen include thing we do now.

So now if the test framework gets an Eigen::MatrixXd argument, it will call the function with Eigen::MatrixXd and Eigen::Matrix<var, -1, -1> arguments. It should also call it with var_type arguments now.

Should we get the basic operator derivatives up first then tackle this so we have more to work with?

If a function gets a whole matrix, should it return a whole matrix or just a regular matrix?

I've been mulling over this question today. It's weird because it depends on how the return type is used after. If the return type then get's assigned in subslices then we need to assign to a normal dynamic matrix, but if it doesn't then we can return back a whole/static matrix. One thing we could do is have it return a whole matrix then in the stan repo level, when we do an assign() from static to dynamic we just make N*M new vari and fill them in with the static matrix values.

To make the mixes of Eigen::MatrixXd, Eigen::Matrix<var, -1, -1>, and var_type feasible, I think we should polish off adj_jac_apply and update it for the new stuff. Presumably it can be programmed more efficiently now.

How about:

(Eigen::MatrixXd, Eigen::Matrix<var>) -> Eigen::Matrix<var>
(Eigen::MatrixXd, var_value<Eigen::Matrix<var>) -> var_value<Eigen::Matrix<var>
(Eigen::Matrix<var>, var_value<Eigen::Matrix<var>) -> Eigen::Matrix<var>

Then in cases where for (2) where we are assigning to an Eigen::Matrix<var> we do the assignment thing above

bbbales2 commented 4 years ago

Should we get the basic operator derivatives up first then tackle this so we have more to work with?

Well if we just add the tests, then the tests will fail until we've implemented all the things we need to implement lol. Which seems nice.

It's weird because it depends on how the return type is used after.

Yeah

One thing we could do is have it return a whole matrix then in the stan repo level, when we do an assign() from static to dynamic we just make N*M new vari and fill them in with the static matrix values.

Yes this makes sense.

I think I'm leaning (Eigen::Matrix<var, -1, -1>, var_value<Eigen::MatrixXd>) -> var_value<Eigen::MatrixXd> now because:

If the return is used in another function, it's faster if it's a var_value<MatrixXd>
If the return value is assigned to a matrix (not a whole matrix), then the downside is we have to copy N^2 adjoints and values. (which is what you said with assign)
If the program has an N^2 matrix of autodiff variables in it, maybe nobody will notice that extra copy we did lol.

Man this is gonna be confusing to the users if we don't have clear error messages.

I guess the advantage of doing (Eigen::Matrix<var, -1, -1>, var_value<Eigen::MatrixXd>) -> Eigen::Matrix<var, -1, -1> is that everything will just work, though things might be unnecessarily pokey. I suppose the compiler could probably throw warnings about using matrices in a slow way? Though these could be annoying.

t4c1 commented 4 years ago

Yes I'm pretty sure we can add a conversion operator to var_type so we can go from var_type -> Eigen::Matrix<var, -1, -1> and another conversion operator for var_type -> Eigen::Matrix<var, -1, -1> using the custom eigen include thing we do now.

We should prefere the pair of conversion constructor and conversion assignment operator to conversion operator, so we can reuse memory. We probably need to make these conversions implicit.

If a function gets a whole matrix, should it return a whole matrix or just a regular matrix?

We should rely on neither. It should return an Eigen expression (which is implicitly convertible to both types). What function actually returns than depends only on what is most conveniant to do in the implementation of the function.

bob-carpenter commented 4 years ago

If a function gets a whole matrix, should it return a whole matrix or just a regular matrix?

We should rely on neither.

How does this interact with typing in the Stan language? I think we'll need to infer the rvalue vs. lvalue status of variables there (I like @t4c1's terminology for the variable types).

t4c1 commented 4 years ago

How does this interact with typing in the Stan language?

We (mostly I) are already working on all functions accepting and later returning general Eigen Expression. I think it should not require any changes in stan language. As for matrix and static matrix in stan language we can say they are assignable one to other.

SteveBronder commented 4 years ago

We should prefere the pair of conversion constructor and conversion assignment operator to conversion operator, so we can reuse memory. We probably need to make these conversions implicit.

I agree with this

We should rely on neither. It should return an Eigen expression (which is implicitly convertible to both types). What function actually returns than depends only on what is most conveniant to do in the implementation of the function.

I'm not sure if I understand this part

t4c1 commented 4 years ago

I'm not sure if I understand this part

It's best not to change what existing functions are returning. Right now that is Eigen Matrices, but soon it will be Eigen expressions. New overloads can also return whichever option. We only need to add implicit conversions between types - so any Eigen expression (including plain matrices) can be converted into static matrix and static matrices can be converted into Eigen matrices.

bbbales2 commented 4 years ago

Well for any function with custom autodiff, there will be two implementations func(matrix) and func(whole matrix). If we call func(a) where a is an expression than can be converted to matrix or whole matrix, I don't think we've solved the problem of what a should be.

t4c1 commented 4 years ago

Many functioins already accept Eigen expressions. I am still working on remaining ones. So no conversion is needed when calling function with an eigen expression as an argument.

bbbales2 commented 4 years ago

Right, but the implementations of func(matrix) and func(whole matrix) are actually different because the inputs and outputs are stored differently with vars vs var_type<MatrixXd>. I'm not sure they can be one function.

Unless they could be if we program things through .adj() accessors and stuff? I guess I'd have to look at a few implementations and think through this.

t4c1 commented 4 years ago

I believe at least some functions will need separate implementations. func(matrix) does/will soon accept expressions, so there will be no ambiguity.

bob-carpenter commented 4 years ago

If something accepts generic expression template type arguments, there won't be any implicit conversion.

As for implementation, I'd think it'd make sense to implement for whole matrices using adjoint-Jacobian techniques, then convert non-whole matrices to whole and delegate. We need to pull out the values and adjoints into matrices of (Matrix<double, R, C>) anyway. At least for any operations more complex than addition. That probably wouldn't return expression templates, though, unless all of a sudden vari<T> can take expression template types and we get somehow implement proper covariance for the T here (if A is assignable to B then vari<A> is assignable to vari<B>.

bbbales2 commented 4 years ago

As for implementation, I'd think it'd make sense to implement for whole matrices using adjoint-Jacobian techniques, then convert non-whole matrices to whole and delegate. We need to pull out the values and adjoints into matrices of (Matrix<double, R, C>) anyway.

I suppose that probably makes sense for reverse mode. This way the rule would be: "if it can be a whole matrix, make it a whole matrix".

I'm still a bit nervous it might make small vector operations slower, or things like indexing submatrices could become slow, but that might be unsubstantiated. Every time we read something, presumably we're going to operate on it. And our autodiff speedup needs to cover this.

Does that still make sense for everything in prim? That every time we call a function that is not element-wise-assign, we copy to a whole matrix first?

I guess what @tadej is saying is prim should be written in terms of generics so it doesn't matter, and so the answer is no.

t4c1 commented 4 years ago

If something accepts generic expression template type arguments, there won't be any implicit conversion.

Right. We only need conversions for functions returning expressions, so we can do var<MatrixXd> x = func(something);.

That probably wouldn't return expression templates, though, unless all of a sudden vari can take expression template types and we get somehow implement proper covariance for the T here (if A is assignable to B then vari<A> is assignable to vari<B>.

This is a good idea in my opinion.

Does that still make sense for everything in prim? That every time we call a function that is not element-wise-assign, we copy to a whole matrix first?

For simple operations such as matrix addition it does not. But for most functions it does.

SteveBronder commented 4 years ago

For the probability distributions right now, like in normal we have code like

  scalar_seq_view<T_y> y_vec(y);
  scalar_seq_view<T_loc> mu_vec(mu);
  scalar_seq_view<T_scale> sigma_vec(sigma);
  size_t N = max_size(y, mu, sigma);

  VectorBuilder<true, T_partials_return, T_scale> inv_sigma(size(sigma));
  VectorBuilder<include_summand<propto, T_scale>::value, T_partials_return,
                T_scale>
      log_sigma(size(sigma));
  for (size_t i = 0; i < stan::math::size(sigma); i++) {
    inv_sigma[i] = 1.0 / value_of(sigma_vec[i]);
    if (include_summand<propto, T_scale>::value) {
      log_sigma[i] = log(value_of(sigma_vec[i]));
    }
  }

  for (size_t n = 0; n < N; n++) {
    const T_partials_return y_dbl = value_of(y_vec[n]);
    const T_partials_return mu_dbl = value_of(mu_vec[n]);

    const T_partials_return y_minus_mu_over_sigma
        = (y_dbl - mu_dbl) * inv_sigma[n];
    const T_partials_return y_minus_mu_over_sigma_squared
        = y_minus_mu_over_sigma * y_minus_mu_over_sigma;

    static double NEGATIVE_HALF = -0.5;

    if (include_summand<propto>::value) {
      logp += NEG_LOG_SQRT_TWO_PI;
    }
    if (include_summand<propto, T_scale>::value) {
      logp -= log_sigma[n];
    }
    logp += NEGATIVE_HALF * y_minus_mu_over_sigma_squared;
    T_partials_return scaled_diff = inv_sigma[n] * y_minus_mu_over_sigma;

Should we write a val(i) method for scalar_seq_view and VectorBuilder? Right now with how I'm writing the views in #2024 for var<mat> the line mu_dbl = value_of(mu_vec[n]); will make a temporary var for var<mat>, but if we could do mu_vec.val(n), then we can just access the array of values directly. We could then take the loop for the sigma components

  for (size_t i = 0; i < stan::math::size(sigma); i++) {
    inv_sigma[i] = 1.0 / value_of(sigma_vec[i]);
    if (include_summand<propto, T_scale>::value) {
      log_sigma[i] = log(value_of(sigma_vec[i]));
    }
  }

and do

    inv_sigma = divide(1.0, sigma_vec.val());
    if (include_summand<propto, T_scale>::value) {
      log_sigma = log(sigma_vec.val());
    }

bbbales2 commented 4 years ago

The new pull requests change this so that there aren't even loops: https://github.com/stan-dev/math/pull/2022 . Dunno how that affects this just wanted to say that's a happening.

Should we write a val(i) method for scalar_seq_view and VectorBuilder

I don't see how this would hurt if it is useful (Edit: that came out weird -- if it's useful, do it imo, cuz I don't see obvious drawbacks)

t4c1 commented 4 years ago

The new pull requests change this so that there aren't even loops:

Not for all functions tho. Some still use scalar_seq_view and loops.

stan-dev / math

Meta Issue for Static Matrices #1875

Description

Current Version: