headmyshoulder / odeint-v2

odeint - solving ordinary differential equations in c++ v2
http://headmyshoulder.github.com/odeint-v2/
Other
337 stars 102 forks source link

dense_output_runge_kutta can run into infinite loop #173

Closed wds15 closed 8 years ago

wds15 commented 8 years ago

Hi!

We are using odeint within a parameter estimation problem which rarely will ask for extreme parameter values being tried out. Hence, the integrator must throw an exception if the parameter values are too extreme. This works nicely with the controlled error stepper. However, the code

  typedef runge_kutta_dopri5< state_type > dopri5_type;
  typedef controlled_runge_kutta< dopri5_type > controlled_dopri5_type;
  typedef dense_output_runge_kutta< controlled_dopri5_type, explicit_controlled_stepper_fsal_tag> dense_output_dopri5_type;

  dense_output_dopri5_type dopri5 = make_dense_output<dopri5_type>(absolute_tolerance, relative_tolerance);

  integrate_times(dopri5,
                  coupled_system,
                  initial_coupled_state,
                  boost::begin(ts_vec), boost::end(ts_vec),
                  step_size,
                  observer);

runs into an infinte loop whenever a "bad" parameter draw is tried. I think that this is caused due to forwarding the calc_state call in dense_output_runge_kutta< controlled_dopri5_type, explicit_controlled_stepper_fsal_tag> directly to the underlying stepper. This bypasses the do_step which is in dense_output_runge_kutta< controlled_dopri5_type, explicit_controlled_stepper_fsal_tag> defined (there a limit of 1000 tries to find an appropiate step-size is enforced).

So it looks to me that the bypassing of calc_state which is used in integrate_times make the system fail, i.e. the program hangs instead of rejecting the extreme parameter value. Only the controlled error stepper scheme seems to help. Hence two questions:

mariomulansky commented 8 years ago

Thanks for reporting this issue. Infinite loop is certainly a bad thing that needs to be fixed. However, I'm having trouble understanding what exactly is happening. Maybe you can identify a specific problem/parameter set where this happens and provide a minimal example that shows this behavior. Then I can look exactly where things go wrong in integrate_times.

Using calc_state is fundamental for a dense output stepper, avoiding it would mean to use a controlled_stepper only which would result in additional calls of the rhs and hence significantly lower performance.

wds15 commented 8 years ago

Example is difficult to put, but replacing the above lines of code with an approach which uses the controlled error stepper makes things work, i.e. an exception is thrown after 1000 tries which never happens with the dense_output scheme and integrate_times. If you look into

stepper/dense_output_runge_kutta.hpp, line 332

then there the do_step function also gives up after 1000 tries. BUT, the integrate_times function uses the calc_state function. However, the calc_state function is passed on to the stepper in line 346. Now, the stepper itself does not protect against extreme parameter draws, i.e. there is no protection against more than 1000 tries of adjusting the step size. Hence, this runs into an infinite loop; at least this is my understanding.

mariomulansky commented 8 years ago

calc state is only called to obtain an intermediate value within the range of the last do_step. Hence I'm having difficulty to figure out where exactly the problem occurs without a precise example...

wds15 commented 8 years ago

Apologies for having trouble with a toy example. In the integrate_times function defined in

integrate/detail/integrate_times.hpp

for the dense_output_tag, there the function calc_state is always used (line 145). This calc_state call is then dispatched to a function which apparently does not give up after 1000 tries.

mariomulansky commented 8 years ago

It is not always used, it is used as long as we are interested in the trajectory at a time before the time point we have currently approximated to. Otherwise do_step is called (line 155). calc_state is guaranteed to give a result as it is just a straight forward interpolation so I don't see how this causes the problem, hence I would like to look into some example...

wds15 commented 8 years ago

I can try to compile an example, but this will take me some time. In a nutshell my example becomes very stiff for these extreme draws while it is non-stiff usually. Maybe I can modify one of your examples.

As I say, a workaround is to use the controlled error stepper scheme - is this a good workaround? Or do I get some hard performance drawbacks by not using the dense output and the controlled error stepper instead? Many thanks!

mariomulansky commented 8 years ago

Using the controlled stepper reduces the performance significantly if your observation intervals (i.e. the time differences in your ts_vec) are of the same order as your typical step sizes. If the step sizes become much smaller (as it often happens for stiff systems), then the performance impact becomes negligible.

wds15 commented 8 years ago

So, here is an example which tickles the bug:

include

include

include

include <boost/numeric/odeint.hpp>

include <boost/phoenix/core.hpp>

include <boost/phoenix/core.hpp>

include <boost/phoenix/operator.hpp>

using namespace std; using namespace boost::numeric::odeint; namespace phoenix = boost::phoenix;

//[ stiff_system_definition typedef boost::numeric::ublas::vector< double > vector_type; typedef boost::numeric::ublas::matrix< double > matrix_type;

struct stiff_system { void operator()( const vector_type &x , vector_type &dxdt , double /* t / ) { dxdt[ 0 ] = -1000001.0 * x[ 0 ] - 1000000.0 \ x[ 1 ]; dxdt[ 1 ] = x[ 0 ]; } };

void write_cout( const double &x , const double t ) { cout << t << '\t' << x << endl; }

const double dt = 1000;

int main( int argc , char **argv ) {

vector_type x2( 2 , 1.0 );

// create a vector with observation time points
std::vector<double> times( 91 );
for( size_t i=0 ; i<times.size() ; ++i )
    times[i] = 1.0 + dt*i;

size_t num_of_steps2 = integrate_times( make_dense_output<runge_kutta_dopri5< vector_type >  >( 1E-15 , 1E-15) , stiff_system() ,
                    x2 , times , dt, cout << phoenix::arg_names::arg2 << " " << phoenix::arg_names::arg1[0] << "\n" );

clog << num_of_steps2 << endl;

return 0;

}

mariomulansky commented 8 years ago

great, thanks! I will look into that now.

wds15 commented 8 years ago

sorry for the weird formatting, I'm just not yet familiar with the markdown syntax... but the idea is simple: Integrate a system which is too stiff which needs more than 1000 tries to adjust the step size. This is what is happening above.

mariomulansky commented 8 years ago

No, for me your example runs fine. It just takes very long to compute. If you change const double dt = 1000; to const double dt = 1.0; you will see ongoing output showing progress....

wds15 commented 8 years ago

Yes, that's right; the problem is ill-posed, but this is what happens when you do that in a parameter estimation programm. odeint then must throw an exception.

Leave it at dt = 1000; but use this:

size_t num_of_steps2 = integrate_times( make_controlled<runge_kutta_dopri5< vector_type >  >( 1E-15 , 1E-15) , stiff_system() ,
                    x2 , times , dt, cout << phoenix::arg_names::arg2 << " " << phoenix::arg_names::arg1[0] << "\n" );

Then odeint throws an exception which allows the outer program to deal with that situation. For me odeint freezes when using the dense output scheme, but the correct thing would be to throw an exception here as well - from my understanding.

mariomulansky commented 8 years ago

Ah ok. I see the inconsistency. However, looking at the code it seems to me that the version with controlled_stepper is wrong, not the dense_out version. I think I will commit a fix that will make the controlled version behave like the dense out version, i.e. both will not throw an exception in your case.

Now I think your understanding of the problem is not entirely correct. The program does not freeze. It just takes very long to execute due to the very small step size coming from the stiffness of the equation. Relying on the odeint exception to identify such a case is inherently unsafe and can not be recommended. You should implement some external stiffness test and use this to identify "bad" parameter values. Checking stiffness usually works by looking at the eigenvalues of the Jacobian, which you could do before the integrate_times call, or within the observer points.

Your current approach is by no means guaranteed to work. I will fix the inconsistency, but not to the result you are aiming for.

wds15 commented 8 years ago

Well, I would have opted for a different fix. To me the program freezes as it takes forever for odeint to come back with a solution for a parameter draw which gets anyway rejected.

What would be the recommended way to use odeint if I want to have an exception thrown whenever too much work is needed to solve the problem? The maxiter checks aren't there for nothing...

Is there a strict definition for stiffness based on eigenvalues of the Jacobian? I guess largest to smallest ev is an extreme ratio, but then what is extreme? Also I am not so sure if the evs are always real. Suggestions for what should work?

Giving up when things are numerically too hard is the approach taken by CVODE for example and this would do the job just fine here...

mariomulansky commented 8 years ago

I understand, and indeed there is currently no such thing as an TOO_MUCH_WORK exception. The test that worked for you did so by accident and is not intended for this purpose. However, this can be achieved with some simple hack in your rhs implementation. Just count the number of calls there and throw an exception if this exceeds some enormous value.

Nevertheless, feel free to open another issue regarding such a TOO_MUCH_WORK exception, this might be a reasonable addition to odeint.

wds15 commented 8 years ago

I can't count the number of calls of the rhs in the program this is intended for. Not possible.

A too much work exception is what is needed. But for what reason then is the

"Integrate adaptive : Maximal number of iterations reached. A step size could not be found."

thrown right now? I don't quite understand as to why this is different than a too much work exception.

mariomulansky commented 8 years ago

Because throwing this exception is based on the number of subsequent failed attempts within a controlled stepper, i.e. when gradually reducing the step size. It does not count the number of successful steps but with a tiny step size, but this is what you would want as far as I understand.

wds15 commented 8 years ago

Yup, it is not exactly the same, but it solved all my problems. Of course, what you have in mind would even be better, I agree.

mariomulansky commented 8 years ago

Well it solved your problems because it had a bug that made it a bit closer to what you want. However, your request doesnt seem to difficult. let me look into that.

wds15 commented 8 years ago

Great. Many thanks!

wds15 commented 8 years ago

One more question: Is then calling integrate_adaptive between each step to some degree close to what I want? This would not be ideal as the step_size has to be adjusted over and over again, but then I would force odeint to throw an exception.

mariomulansky commented 8 years ago

No, this will not work, because the exception will not fire in your case I'm afraid.

wds15 commented 8 years ago

even if I use it as controlled stepper? The code in integrate_adaptive.hp under details (line 108) looks to me as if there is an exception thrown just as I want it to.

wds15 commented 8 years ago

Hi!

I wanted to try your suggestion to use a counter inside the system functor which throws an exception for excessive number of trials. But this does not work, because the copy constructor is called many times whenever a try is rejected. Here is what I tried - is there another way of doing it?

//[ stiff_system_definition
typedef boost::numeric::ublas::vector< double > vector_type;
typedef boost::numeric::ublas::matrix< double > matrix_type;

struct stiff_system
{
  std::size_t evals_;
  stiff_system() : evals_(0) {
    std::cout << "Creating stiff_system" << std::endl;
  }
  stiff_system(const stiff_system& o) : evals_(o.evals_) {
    std::cout << "Copying stiff_system" << std::endl;
  }
    void operator()( const vector_type &x , vector_type &dxdt , double /* t */ )
    {
      if(++evals_ == 100000)
    throw(std::overflow_error("too much work"));
      std::cout << "Evals = " << evals_ << std::endl;

        dxdt[ 0 ] = -1000001.0 * x[ 0 ] - 1000000.0 * x[ 1 ];
        dxdt[ 1 ] = x[ 0 ];
    }
};

const double dt = 10000.0;

int main( int argc , char **argv )
{

  vector_type x1( 2 , 1.0 );
  vector_type x2( 2 , 1.0 );

    // create a vector with observation time points
    std::vector<double> times( 91 );
    for( size_t i=0 ; i<times.size() ; ++i )
        times[i] = 1.0 + dt*i;

    /**/
    size_t num_of_steps2 = integrate_times( make_dense_output<runge_kutta_dopri5< vector_type >  >( 1E-15 , 1E-15) , stiff_system() ,
                        x1 , times , dt, cout << phoenix::arg_names::arg2 << " " << phoenix::arg_names::arg1[0] << "\n" );
    /**/

    clog << num_of_steps2 << endl;

    size_t num_of_steps3 = integrate_times( make_controlled<runge_kutta_dopri5< vector_type >  >( 1E-15 , 1E-15) , stiff_system() ,
                        x2 , times , dt, cout << phoenix::arg_names::arg2 << " " << phoenix::arg_names::arg1[0] << "\n" );

    clog << num_of_steps3 << endl;

    return 0;
}
mariomulansky commented 8 years ago

odeint takes the rhs system by value, as this often gives better performance due to better inlining possibilities. To make sure the rhs is passed on as reference you can use boost::ref:

stiff_system sys;
integrate_times(... , boost::ref(sys), ...);

also described here: http://headmyshoulder.github.io/odeint-v2/doc/boost_numeric_odeint/odeint_in_detail/using_boost__ref.html

wds15 commented 8 years ago

Thanks a lot. Now things work the way they should for my application, I think. I figured a way to count the number of calls to my system functor whenever that exceeds 1E5, I throw an exception. Would you consider this as an appropriate way of doing this?

mariomulansky commented 8 years ago

I cannot comment on your threshold choice (1e5) as this depends on many other circumstances, but in general I find this a valid approach.

wds15 commented 8 years ago

Great. Integrating this into odeint would be even better rather than having the users require to protect against "too_much_work". Many thanks again for the really quick turn-over here.

mariomulansky commented 8 years ago

I consider this issue closed then. Further discussion on overflow exceptions can be discussed in #174