Variadic argument test framework

bbbales2 commented 6 years ago

Edit (@syclik): 4/24/2019. Reducing the issue to just reverse mode. We can create new issues for forward and mix later.

Description

The goal here is to make a nice, generic testing framework built with the new C++14 stuff. This would be used from within Google Test.

Generically goal would be to test consistency of prim/var function implementations with any number of inputs automatically. This would make development of the Math library easier.

What is meant by consistency in all these cases is that, for a given set of inputs,

values match (comparing prim and rev)
derivatives using reverse mode match finite difference or complex step results
error handling is match (comparing prim and rev)

Other requirements:

Provide useful error messages about where in the tests the failures happened. As far as Google test would be concerned, this would just be a single giant test. For it to be useful for identifying bugs, it'd need to be more verbose on output than just EXPECT_EQ failed or whatnot.

Design

A possible interface looks like:

// the function under test
auto test_function = [](auto x, auto y, auto z) { 
  return stan::math::foo(x, y, z);
};

// the testing functions
template <typename F, typename... Targs>
void test_reverse_mode(F f, const Targs&... args);

template <typename F, typename E, typename... Targs>
void test_reverse_mode_exception(F f, E e, const Targs&... args);

// usage
test_reverse_mode(test_function, 1.0, 2.0, 3.0);
test_reverse_mode_exception(test_function, std::domain_error, 1.0, 2.0, 3.0);

This will be able to handle std::vector<double> and std::vector<int> with this interface.

NB: I had originally wanted this interface, but I couldn't see a way to implement it. There might be, but I couldn't figure it out.

test_reverse_mode<stan::math::foo>(1.0, 2.0, 3.0);

Implementation

Technical things to make this possible:

a function that takes in another function + a set of tuples. Call the passed in function with all combinations of arguments of the set of tuples (@bbbales2 has some code for this on branch feature/automatic-autodiff-testing)
a function that gives array-like access to parameter pack of other variables (so all the autodiff can be generic) (I have some code for this -- could use better)
functions for promoting double arguments to var as needed

Instead of computing Jacobians of multiple input/non-scalar output functions. I'm more interested in dotting outputs with random vectors and testing gradients in random directions a few times (so tests are always on scalars). It might be too early optimization, but I'm not fond of the idea of test complexity scaling with number of inputs/outputs and I think this'd be just as good.

@bbbales2's implementation got close, but it promoted all the types, then ran the gradient() function, which is destroys the autodiff stack. We have to promote lazily. In other words, for a univariate function, this is how it should behave:

The user calls test_reverse_mode(test_function, 1.0);. From within test_reverse_mode:

compute the double value as a reference: double expected_result = test_function(1.0);
promote the arguments: var x = to_var(1.0);
compute the var result value: var result = test_function(x);
compute the gradients: std::vector<double> grad; std::vector<var> vars = {x}; result.grad(x, vars, grad);
compare the values: EXPECT_FLOAT_EQ(expected_result, result.val());
compare the gradient: EXPECT_FLOAT_EQ(finite_diff(test_function, x), grad[0]); (need some way to compute the finite diff.
clean the autodiff stack: stan::recover_all_memory().

When this is done with two arguments, we'll have to do steps 2-7 in a loop where we'll have to recursively walk through the possible combinations of var / double for each argument.

Additional Information

The exact features are unclear because I don't understand fully what is easily do-able.

Basically I need an issue to track this stuff.

This is building on stuff from a few testing frameworks that are already in Stan's unit tests:

https://github.com/stan-dev/math/blob/develop/test/unit/math/mix/mat/util/autodiff_tester.hpp
vectorization tests: lib/stan_math/test/unit/math/mix/mat/fun/atan_test.cpp
half of the rng tests (not the half that checks the distributions are working correctly): test/unit/math/prim/mat/prob/vector_rng_test_helper.hpp

Current Math Version

v2.18.0

bob-carpenter commented 6 years ago

I'm OK with the proposed directional derivative tests instead of exhaustive input and output tests. @syclik --- do you have an opinion on this?

I'm not sure why test all combinations of inputs for prim/rev is called out separately.

It should be possible to use or borrow from the general testing framework (the one used for `operator_multiplication_test).

I'm OK dropping the vectorization tests there now and relying on a few explicit instantiations for each in the new framework.

The trick is going to be to build up functionality in small, understandable PRs.

syclik commented 6 years ago

On Mon, Aug 20, 2018 at 5:48 AM Bob Carpenter notifications@github.com wrote:

I'm OK with the proposed directional derivative tests instead of exhaustive input and output tests. @syclik https://github.com/syclik --- do you have an opinion on this?

I'm not sure what the alternative (exhaustive input and output tests) is.

For simplicity, I think the overall description / design is good.

I think the implementation should do a little more than just check the directional derivatives. Here's what I'm thinking: for anything vectorized, this sort of tests give a false sense of correctness. There are two cases I think we should cover (because we've introduced bugs this way before and I think we'll continue to be at risk because it's tricky). The two cases are where we've vectorized with the same var used multiple times and the other case where all the elements are independent vars.

We won't have this sort of testing requirement when using autodiffed operations. We get into this problem when we start specializing derivatives. There isn't really a lower spot to test it at.

So.... I know it adds complexity, but maybe a flag to just test the directional derivatives when it's a simply written function and something that will just include a simple construction of whatever vectorized version under both conditions? Not looking for that to be too crazy and full coverage. As long as the value of the derivative isn't trivial (maybe 0), it should be pretty easy to tell if this mistake was made.

I think with that, the tests become something that we can trust. If not, I think the person writing the function should add more tests to demonstrate that the vectorized code is correct.

Btw, I don't mean we have to test every derivative when it's vectorized.

Oh... and happy to discuss; perhaps there's a different way to prevent that sort of behavior.

I'm not sure why test all combinations of inputs for prim/rev is called out

separately.

It should be possible to use or borrow from the general testing framework (the one used for `operator_multiplication_test).

I'm OK dropping the vectorization tests there now and relying on a few explicit instantiations for each in the new framework.

The trick is going to be to build up functionality in small, understandable PRs

Agreed. Perhaps the first would be to have the simplest test with the interface defined. Then other PRs could be for adding more tests incrementally?

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/993#issuecomment-414260306, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_FzXyZ2wQ15M83p9lztFUq8z-iDQKks5uSoXigaJpZM4WDqB3 .

syclik commented 6 years ago

I should have asked this before: is this supposed to be a function with one return or multiple return values?

On Mon, Aug 20, 2018 at 9:54 AM Daniel Lee bearlee@alum.mit.edu wrote:

On Mon, Aug 20, 2018 at 5:48 AM Bob Carpenter notifications@github.com wrote:

I'm OK with the proposed directional derivative tests instead of exhaustive input and output tests. @syclik https://github.com/syclik --- do you have an opinion on this?

I'm not sure what the alternative (exhaustive input and output tests) is.

For simplicity, I think the overall description / design is good.

I think the implementation should do a little more than just check the directional derivatives. Here's what I'm thinking: for anything vectorized, this sort of tests give a false sense of correctness. There are two cases I think we should cover (because we've introduced bugs this way before and I think we'll continue to be at risk because it's tricky). The two cases are where we've vectorized with the same var used multiple times and the other case where all the elements are independent vars.

We won't have this sort of testing requirement when using autodiffed operations. We get into this problem when we start specializing derivatives. There isn't really a lower spot to test it at.

So.... I know it adds complexity, but maybe a flag to just test the directional derivatives when it's a simply written function and something that will just include a simple construction of whatever vectorized version under both conditions? Not looking for that to be too crazy and full coverage. As long as the value of the derivative isn't trivial (maybe 0), it should be pretty easy to tell if this mistake was made.

I think with that, the tests become something that we can trust. If not, I think the person writing the function should add more tests to demonstrate that the vectorized code is correct.

Btw, I don't mean we have to test every derivative when it's vectorized.

Oh... and happy to discuss; perhaps there's a different way to prevent that sort of behavior.

I'm not sure why test all combinations of inputs for prim/rev is called

out separately.

It should be possible to use or borrow from the general testing framework (the one used for `operator_multiplication_test).

I'm OK dropping the vectorization tests there now and relying on a few explicit instantiations for each in the new framework.

The trick is going to be to build up functionality in small, understandable PRs

Agreed. Perhaps the first would be to have the simplest test with the interface defined. Then other PRs could be for adding more tests incrementally?

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/993#issuecomment-414260306, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_FzXyZ2wQ15M83p9lztFUq8z-iDQKks5uSoXigaJpZM4WDqB3 .

bob-carpenter commented 6 years ago

@syclick: Which function are you asking about? It's supposed to be a test framework for all of our differentiable functions.

Suppose we a function f : R^N -> R^M.

The exhaustive strategy is to test each N * M first-order, N^2 * M second-order, and N^3 * M third-order derivatives.

The directional derivative strategy is to choose a few M-vectors, say y1, ..., yK and test the function lambda x. f(x).transpose() * yk, which is R^N -> R. That reduces testing load by a factor of M without seeming to lose any information if the yk aren't regular.

In general, this is going to work to test arbitrary functions of the form f:(R^N1 x ... x R^Nj) -> R^M. The vectorized functions all have this form and my suggestion was that we just test them like all the unvectorized functions. So far, the vectorization is for distributions and for unary functions pretty much. I was just suggesting we not test the 10,000 possible instantiations of student-t, say, or the arbitrary number of instantiations of the vectorized functions (which work on arbitrary dimension arrays), but make a strategic selection of types to test.

So I don't see how this depends on automatic vs. hand-written derivatives or vectorized vs. unvectorized functions.

syclik commented 6 years ago

@Bob_Carpenter, thanks for the clarification! That helps a lot.

So I'm definitely with you that we don't need to do exhaustive testing. I'm trying to write out what I think are common, yet hard to debug errors in writing custom gradient code.

What I'm thinking is we're covered if we do this: f:(R^2 x R^2 x R^2 ... x R^2) -> R^M where we test where each of the M args are the same vars and where each of the M args are independent vars.

I think we'd still be down to R^(2N) -> R, but maybe I'm mistaken.

On Mon, Aug 20, 2018 at 11:01 AM Bob Carpenter notifications@github.com wrote:

@syclick https://github.com/syclick: Which function are you asking about? It's supposed to be a test framework for all of our differentiable functions.

Suppose we a function f : R^N -> R^M.

The exhaustive strategy is to test each N M first-order, N^2 M second-order, and N^3 * M third-order derivatives.

The directional derivative strategy is to choose a few M-vectors, say y1, ..., yK and test the function lambda x. f(x).transpose() * yk, which is R^N -> R. That reduces testing load by a factor of M without seeming to lose any information if the yk aren't regular.

In general, this is going to work to test arbitrary functions of the form f:(R^N1 x ... x R^Nj) -> R^M. The vectorized functions all have this form and my suggestion was that we just test them like all the unvectorized functions. So far, the vectorization is for distributions and for unary functions pretty much. I was just suggesting we not test the 10,000 possible instantiations of student-t, say, or the arbitrary number of instantiations of the vectorized functions (which work on arbitrary dimension arrays), but make a strategic selection of types to test.

So I don't see how this depends on automatic vs. hand-written derivatives or vectorized vs. unvectorized functions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/993#issuecomment-414347955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_F37uDfOXMg-Qoljx9YjZzmKqLxMcks5uSs89gaJpZM4WDqB3 .

bbbales2 commented 6 years ago

So with the lingo so far, f : R^N -> R^M, then the thing I was talking about testing is the product w^T * J * v, where J is the Jacobian of f, w is an M row random column vector and v is an N row random column vector.

Whether you're testing with fvars or vars, it's one forward or one reverse pass to get the number you wanna test. If you kept doing them, you could solve for the J. The finite difference stencils and complex step check check code would be simplified as well.

bob-carpenter commented 6 years ago

On Aug 20, 2018, at 7:41 PM, Ben Bales notifications@github.com wrote:

So with the lingo so far, f : R^N -> R^M, then the thing I was talking about testing is the product w^T J v, where J is the Jacobian of f, w is an M row random column vector and v is an N row random column vector.

Cool. I missed the final multiply by v.

Whether you're testing with fvars or vars, it's one forward or one reverse pass to get the number you wanna test.

That last multiply by v is another O(m) reduction in testing load.

If you kept doing them, you could solve for the J.

Would that require M passes?

I don't think this is necessary.

If there are special inputs, we can test for those in the same way.

The finite difference stencils and complex step check check code would be simplified as well.

I'm not sure what you mean by "stencil" but the functionals for finite diff should stay the same. We can add a functional to compute directional derivatives with reverse mode, but I don't know you'd use that in testing.

syclik commented 5 years ago

I want to make sure we don't lose track of @bbbales2's effort. His original PR was here: #1021.

We identified a couple problems. The first was that the variables weren't persisting on the stack so we were dealing with undefined behavior. We can fix this by lazily promoting the argument types to stan::math::var types.

There's a really cool thing that @bbbales2 did inside his PR that we should take note of. His call_all_argument_combos expands arguments into the exhaustive list of tuple of arguments. It's done using some neat recursion. We'll want to do the same using types instead of instantiated objects.

syclik commented 5 years ago

I updated the issue description with more information. I've been slowly working on this... I'm closer, but still not complete. Would love help if anyone is inclined.

ghost commented 5 years ago

I'm interested. What might be involved?

syclik commented 5 years ago

I'm interested. What might be involved?

Great question. First off, a lot of variadic template stuff. (I'm still getting fluent in it; assume C++14.)

If you want to know exactly what I'm thinking about, @bbbales2 put together something called call_all_arg_combos or something like that on his branch. It essentially walks through each of the arguments, figures out what needs to be promoted (from double to var and it handles things like std::vector<double>), then builds a typelist with the full combinatorial expansion of arguments AND it also carries the instance of this type which contains the promoted vars. He cleverly built that recursively using a bunch of templated C++.

That was almost correct. Instead of promoting all the vars eagerly, we want to promote lazily. That means we need to do the same thing that he did with the typelist, but having the promotion happen later. That way we can be safe about usage by adding vars to the stack, running the gradient code, then recovering memory multiple times. So... the thing we need to do is do that combinatorial expansion with types, not implicitly with instances. Once we figure that out, we can do the rest. I was having trouble manipulating types to build that list... it's a lot easier for me to think about building a binary tree with recursion with objects, but it is definitely possible to do with types.

I don't know how much of that is understandable, but that's the problem I'm dealing with right now. If you want to help with this part, I can try to write a proper spec for what I'm doing.

ghost commented 5 years ago

Where is the code you were using to manipulate the list of types?

I noticed that an old branch cited in Implementation: feature/issue-993-testing is deleted or otherwise missing.

syclik commented 5 years ago

Whoops. Wrong branch. It's this one: feature/automatic-autodiff-testing

syclik commented 5 years ago

I updated the PR description with the correct branch.

rok-cesnovar commented 5 years ago

I figure this is also a non-issue with the new AD testing framework. Closing. If I missed something please reopen.

stan-dev / math