Closed Tuhin-Ghosh closed 1 year ago
In short: it's tricky to get a speed up in many cases. You'll need to find out what the bottleneck is. For example, try turning off collisions. I have myself not used MERCURIUS with OpenMP. There could be various places that are not fully optimized. You could try something simple (leapfrog) just for testing. But it will be major task to get a simulation like this run efficiently on many cores.
Even after setting "REB_COLLISION_NONE" and "REB_BOUNDARY_NONE", I am getting:
"OpenMP speed-up: 0.685x (perfect scaling would give 8x)"
I also tested by setting the integrator to MERCURIUS in your OpenMP example and there I do get a speedup(with 500 particles):
"OpenMP speed-up: 3.399x (perfect scaling would give 8x)".
I also used the LEAPFROG integrator as you suggested. Using 10k particle I am getting:
"OpenMP speed-up: 0.942x (perfect scaling would give 8x)"
But now when I look at my CPU usage I see only one core is being utilized all the time.
And after setting "REB_COLLISION_NONE" and "REB_BOUNDARY_NONE", I get :
"OpenMP speed-up: 1.074x (perfect scaling would give 8x)"
This time however I see all CPU cores are working. So I could not figure out what the bottleneck is.
I'm afraid I don't have an easy solution for you. They way forward would be to use a profiler to see where the bottleneck is, then trying to remove it.
Hi, I am getting a slowdown if I use OpenMP. I am attaching the code below.
include
include
include
include
include
include <sys/time.h>
include
include "rebound.h"
include "tools.h"
include "output.h"
void run_sim(); void heartbeat(struct reb_simulation r); double a_to_P(double a, double m_star, double m_planet); double P_to_a(double P, double m_star, double m_planet); int reb_collision_resolve_merge_pass_through(struct reb_simulation const r, struct reb_collision c);
double E0;
int main(int argc, char* argv[]) { // Get the number of processors int np = omp_get_num_procs(); // Set the number of OpenMP threads to be the number of processors omp_set_num_threads(np);
}
void run_sim() {
}
int reb_collision_resolve_merge_pass_through(struct reb_simulation* const r, struct reb_collision c) { // This function passes the collision to the default merging routine. // If a merger occured, that routine will return a value other than 0. // This function then outputs some information about the merger. int result = reb_collision_resolve_merge(r, c); if (result != 0) { printf("A merger occured! Particles involved: %d, %d.\n", c.p1, c.p2); } return result; }
void heartbeat(struct reb_simulation r) { if (reb_output_check(r, 2. M_PI)) {
}
double a_to_P(double a, double m_star, double m_planet) { return sqrt(a a a / (m_star + m_planet )); }
double P_to_a(double P, double m_star, double m_planet) { return pow((m_star + m_planet ) P P, 1.0 / 3.0); }
Note: I do see a speed up while running the OpenMP example. I just don't understand what I am doing wrong in the above code. I am using the same Makefile as one in the OpenMP example. When I look at cpu usage I see all cores are being utilized but the speedup is always below 1.