Closed ds283 closed 9 years ago
Yes, vector_space_reduce is not yet implemented for vexcl, so it will not work with the controlled and dense output stepper, since the require vector_space_reduce. We will put this on our ToDo list. Maybe you can even implement this specialization for vector_space_reduce? Nevertheless, I think a dense output stepper is not much better then a stepper like rk5 or dopri5. The dense output stepper wraps a controlled stepper which performs the step size control on the whole ensemble.
It seems that vector_space_reduce
is no longer used in github version of odeint (c3ef77dff54dfda0c4d9933950349003a6902f0f suggests that, and I also grepped for vector_space_reduce
). Apparently vector_space_norm_inf
is used for the purpose now. Karsten, if that is correct, I could make a pull request for specification of vector_space_norm_inf
for vexcl types.
As for the boost version of odeint, I am not sure if it is possible to implement vector_space_reduce
efficiently. It seems that Op
here takes its arguments by value, which would result in unnecessary copies of vexcl vectors.
@ds283 :Can you check, if your code is now working correctly?
Thanks for your helpful responses, which are very much appreciated.
With the current versions of odeint
and VexCL
pulled from github, compilation still fails due do
In file included from /Users/ds283/Documents/Code/odeint-v2/boost/numeric/odeint.hpp:35:
/Users/ds283/Documents/Code/odeint-v2/boost/numeric/odeint/stepper/controlled_runge_kutta.hpp:86:24:
error: no matching member function for call to 'norm_inf'
return algebra.norm_inf( x_err );
~~~~~~~~^~~~~~~~
Although from looking at the source, it isn't clear to me why that doesn't work.
Did you include boost/numeric/odeint/external/vexcl/vexcl_norm_inf.hpp
?
I see. It does compile if I include that file. The integration is slow, but that is a different issue.
You'll probably need an ensemble of size 1e5 -- 1e6 ODEs and a decent GPU to get some acceleration from OpenCL. OpenCL has high initialization overhead (see e.g. http://arxiv.org/abs/1212.6326), so it requires large problems to hide that.
Thanks for all your helpful responses, which are very much appreciated.
I had looked at your arXiv paper. Currently, I'm simply experimenting to work out whether GPU methods can help us. It may eventually run on GPU nodes on a HPC cluster, at which point it probably would involve an ensemble of ODEs of the size you quote.
I had understood that the overhead came from compiling the OpenCL kernels at runtime. But at the moment I am finding it much slower during the course of the integration too, when I would have naively expected that all data would be available on the GPU and there would be no further overheads. Should I also have anticipated this? (The solution is initially rapidly oscillatory and the integrator probably needs to shorten its stride; I was wondering whether there is any data transfer between the GPU and CPU while adjusting the step size.)
There are other overheads in opencl (and in cuda for that matter) besides initialization. If you look at the plots in the paper, you'll notice that Thrust on CPU (which is openmp in disguise) shows much better performance than either opencl or cuda solutions for smaller problems. You really need to saturate your gpu to get any performance benefits from it.
Vexcl allows you to create a monolithic kernel for an odeint stepper (see slides from meeting c++ 2012, section "kernel builder": https://speakerdeck.com/ddemidov/vexcl-at-meeting-c-plus-plus-2012). This should perform about an order of magnitude faster, but the option is only available for steppers with constant time steps. On Jul 6, 2013 4:35 PM, "ds283" notifications@github.com wrote:
Thanks for all your helpful responses, which are very much appreciated.
I had looked at your arXiv paper. Currently, I'm simply experimenting to work out whether GPU methods can help us. It may eventually run on GPU nodes on a HPC cluster, at which point it probably would involve an ensemble of ODEs of the size you quote.
I had understood that the overhead came from compiling the OpenCL kernels at runtime. But at the moment I am finding it much slower during the course of the integration too, when I would have naively expected that all data would be available on the GPU and there would be no further overheads. Should I have anticipated this to? (The solution is initially rapidly oscillatory and the integrator probably needs to shorten its stride; I was wondering whether there is any data transfer between the GPU and CPU while adjusting the step size.)
— Reply to this email directly or view it on GitHubhttps://github.com/headmyshoulder/odeint-v2/issues/92#issuecomment-20553741 .
Apologies if this is a dumb question.
I am attempting to use odeint to solve an ensemble of ODEs using the VexCL. To do so, I'd like to use a dense output stepper with a VexCL object as the state type.
However, I am struggling to make this work. I believe I ought to be able to write something like
where for my application,
state_type
would be avex::multivector
.However, when I do this, clang complains
Making an instance of
runge_kutta_dopri5
parametrized bystate_type
works perfectly well, as in the VexCL example in the documentation, except that it does not adjust the step size. However, looking at the corresponding CUDA example makes me believe it ought to be possible to wrap it as a controlled or dense-output stepper. Am I doing something wrong, or is this not possible?