hannorein / rebound

💫 An open-source multi-purpose N-body code.
https://rebound.readthedocs.io/
GNU General Public License v3.0
820 stars 217 forks source link

Support for ARM 64 (Apple Silicon) #681

Closed Francyrad closed 1 year ago

Francyrad commented 1 year ago

Dear developers I've been able to successfully run some example that didn't require the use of MPI. Anyway, when it comes to install some simulations that require mpi (in thos case "self_gravity_disc_mpi), it's impossible to compile the program. I attach here the error, which should be given by some incompatibilities with arm64 architecture. I hope you can solve Francesco. rebound_error_mpi.txt

hannorein commented 1 year ago

The errors are not related to ARM. It's just that the MPI part of REBOUND hasn't been keeping up with a lot of changes. This is mainly due to me not having a good problem where MPI might be helpful (OpenMP is often enough). It would take some effort to get MPI working again, and the detail will depend a lot on the specific problem you're trying to solve.

Francyrad commented 1 year ago

Thank you professor for your response! Actually Rebound would be the only program that could help me. I want to simulate the formation of the moon starting from a disk around a proto earth like in this article. The authors used Rebound:

"FORMATION OF MULTIPLE-SATELLITE SYSTEMS FROM LOW-MASS CIRCUMPLANETARY PARTICLE DISKS"

To account the particle interactions when my moon will be formed, i will need all my processors using mpi. Actually i'm using a very good program called OpenSPH that is able to do that works, but when aggregation happen the code can only run in 1 core, so my simulation toke 3 months to finish and with only 10k particles. So i'd like to test Rebound to run much more particles.

I hope you can help me to solve this problem, and letting me know if I really need MPI to use all the processors of my computer or all the processors of a supercomputer.

Thank you for your help Francesco

hannorein commented 1 year ago

10k is not very large. You might not get the performance from MPI that you're hoping for. I'd recommend using OpenMP for shared memory parallelization. Most computing clusters will have nodes that can run tens of threads in parallel. With that you effectively have a few hundred particles per core. MPI on top of that will most likely not help.

Francyrad commented 1 year ago

Yes, it is not very large, but it is when i must run that only in 1 core with open. Will OpenMP help me to run the simulation with 100k - 1kk particles or more?

hannorein commented 1 year ago

I can't give you a straight answer. You need to try it out for your specific setup.

Francyrad commented 1 year ago

I will try to design the simulation and let you know. thank you for your help!

Francyrad commented 1 year ago

Dear professor @hannorein I've successfully designed my simulation and it runs...

Screenshot 2023-04-28 alle 14 48 23

except for the detail that OpenMP doesn't work for some reason and it runs only in 1 core:

Screenshot 2023-04-28 alle 14 59 04

Changing the number of particles doesn't help, it became impossibly slow (terminal output):

N_tot=  10001     t=  2990.000000  dt=  1.000000  cpu=  3.597678 [s]  t/tmax=  0
N_tot=  10001     t=  3000.000000  dt=  1.000000  cpu=  3.649657 [s]  t/tmax=  0.00%

2023-04-28 14:47:38.844 rebound[20266:3701616] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_bundleIdentifierWithReply:) block performed very slowly (2.49 secs).

N_tot=  10001     t=  3010.000000  dt=  1.000000  cpu=  3.675214 [s]  t/tmax=  0
N_tot=  10001     t=  3020.000000  dt=  1.000000  cpu=  3.625312 [s]  t/tmax=  0
N_tot=  10001     t=  3030.000000  dt=  1.000000  cpu=  3.659524 [s]  t/tmax=  0
N_tot=  10001     t=  3040.000000  dt=  1.000000  cpu=  3.607195 [s]  t/tmax=  0
N_tot=  10001     t=  3050.000000  dt=  1.000000  cpu=  3.600862 [s]  t/tmax=  0
N_tot=  10001     t=  3060.000000  dt=  1.000000  cpu=  3.610075 [s]  t/tmax=  0
N_tot=  10001     t=  3070.000000  dt=  1.000000  cpu=  3.608410 [s]  t/tmax=  0
N_tot=  10001     t=  3080.000000  dt=  1.000000  cpu=  3.590928 [s]  t/tmax=  0
N_tot=  10001     t=  3090.000000  dt=  1.000000  cpu=  3.599406 [s]  t/tmax=  0
N_tot=  10001     t=  3100.000000  dt=  1.000000  cpu=  3.614358 [s]  t/tmax=  0
N_tot=  10001     t=  3110.000000  dt=  1.000000  cpu=  3.610601 [s]  t/tmax=  0
N_tot=  10001     t=  3120.000000  dt=  1.000000  cpu=  3.631834 [s]  t/tmax=  0
N_tot=  10001     t=  3130.000000  dt=  1.000000  cpu=  3.628529 [s]  t/tmax=  0
N_tot=  10001     t=  3140.000000  dt=  1.000000  cpu=  3.619742 [s]  t/tmax=  0
N_tot=  10001     t=  3150.000000  dt=  1.000000  cpu=  3.639600 [s]  t/tmax=  0
N_tot=  10001     t=  3160.000000  dt=  1.000000  cpu=  3.611394 [s]  t/tmax=  0.00
%zsh: terminated  ./rebound
francyrad@MacBook-Pro-di-Francesco moon % make
Compiling shared library librebound.so ...
/Library/Developer/CommandLineTools/usr/bin/make -C ../../src/
make[1]: Nothing to be done for `all'.
fatal: .git non è un repository Git (né lo è alcuna delle directory genitrici)

Compiling problem file ...
gcc -I../../src/ -Wl,-rpath,./ -std=c99 -Wpointer-arith -D_GNU_SOURCE -O3  -I/usr/local/include -Wall -g  -I/opt/homebrew/include -L/opt/homebrew/lib -fopenmp -D_APPLE -DOPENGL -DOPENMP -DGITHASH=0000000000gitnotfound0000000000000000000 problem.c -L. -lrebound -L/usr/local/lib -lglfw -framework Cocoa -framework OpenGL -framework IOKit -framework CoreVideo -fopenmp -o rebound

REBOUND compiled successfully.
francyrad@MacBook-Pro-di-Francesco moon % ./rebound
N_tot=  100001    t=  0.000000  dt=  1.000000  cpu=  0.000000 [s]  t/tmax=  0.00
N_tot=  100001    t=  10.000000  dt=  1.000000  cpu=  627.292950 [s]  t/tmax=  0.00%

In the user manual there is written that to enable OpenMP it is only necessary to compile the makefile correctly. I've MacOS Ventura with libomp installed with homebrew. This is the Makefile that i use:

export CC=gcc

ifeq ($(shell $(CC) -v 2>&1 | grep -c "clang"), 1)
export OPENMPCLANG=1
else
export OPENMP=1
endif

export OPENGL=1
include ../../src/Makefile.defs

all: librebound
    @echo ""
    @echo "Compiling problem file ..."
    $(CC) -I../../src/ -Wl,-rpath,./ $(OPT) $(PREDEF) problem.c -L. -lrebound $(LIB) -o rebound
    @echo ""
    @echo "REBOUND compiled successfully."

librebound: 
    @echo "Compiling shared library librebound.so ..."
    $(MAKE) -C ../../src/
    @-rm -f librebound.so
    @ln -s ../../src/librebound.so .

clean:
    @echo "Cleaning up shared library librebound.so ..."
    @-rm -f librebound.so
    $(MAKE) -C ../../src/ clean
    @echo "Cleaning up local directory ..."
    @-rm -vf rebound

It compiles with export OPENMP=1. When i try to force to compile with CLANG, the terminal gives to me the following error:

francyrad@MacBook-Pro-di-Francesco moon % make     
Compiling shared library librebound.so ...
/Library/Developer/CommandLineTools/usr/bin/make -C ../../src/
make[1]: Nothing to be done for `all'.
fatal: .git non è un repository Git (né lo è alcuna delle directory genitrici)

Compiling problem file ...
gcc -I../../src/ -Wl,-rpath,./ -std=c99 -Wpointer-arith -D_GNU_SOURCE -O3  -I/usr/local/include -Wall -g  -I/opt/homebrew/include -L/opt/homebrew/lib -I/include -Xpreprocessor -fopenmp -D_APPLE -DOPENGL -DOPENMP -DGITHASH=0000000000gitnotfound0000000000000000000 problem.c -L. -lrebound -L/usr/local/lib -lglfw -framework Cocoa -framework OpenGL -framework IOKit -framework CoreVideo -lomp -o rebound
ld: library not found for -lomp
collect2: error: ld returned 1 exit status
make: *** [all] Error 1

Is there some way how can I solve the problem? OpenMP is correctly installed and compiled with Homebrew. My processor is M1Pro with 10 cores. Let me know and thank you in advance

Francesco

hannorein commented 1 year ago

I'm not sure. I've just tested it on an M1. If I compile and run examples/openmp without any changes, it works, for me. For that specific example (20k particles), I get a speed up of 4.5x on 8 cores.

You probably have not correctly set up your environment variables (or alternatively the makefile). Specifically, you need to somewhere point the compiler to your OpenMP header and library, e.g.

 export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
 export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

Hanno

Francyrad commented 1 year ago

Thank you for your answer! I've no idea what is going on honestly. I already had these path in my bash (exactly the same lines) for some other program that I used (and that i do not remember honestly).

I runned the example as you did with 20K particles, this is my output:

francyrad@MacBook-Pro-di-Francesco openmp % make
Compiling shared library librebound.so ...
/Library/Developer/CommandLineTools/usr/bin/make -C ../../src/
make[1]: Nothing to be done for `all'.
fatal: .git non è un repository Git (né lo è alcuna delle directory genitrici)

Compiling problem file ...
gcc -I../../src/ -Wl,-rpath,./ -std=c99 -Wpointer-arith -D_GNU_SOURCE -O3  -I/usr/local/include -Wall -g  -fopenmp -D_APPLE -DOPENMP -DGITHASH=0000000000gitnotfound0000000000000000000 problem.c -L. -lrebound -L/usr/local/lib -fopenmp -o rebound

REBOUND compiled successfully.
francyrad@MacBook-Pro-di-Francesco openmp % ./rebound                      
2023-04-29 20:07:58.220 rebound[76316:5838142] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_bundleIdentifierWithReply:) block performed very slowly (5.72 secs).

OpenMP speed-up: 0.999x (perfect scaling would give 10x)

I've no idea what is going on honestly, because the executable compile perfectly. I tried also to compile with clang and homebrew, the compile both, but the problem is the same.

This is also my (incomplete) code, I don't think it's fault of my code, but can you please check with the Macbook what does the output about OpenMP says? It would be of help. Thank you for your support.

hannorein commented 1 year ago

I can't run your entire code for you. It's too long (and you call it incomplete). Try to simplify your code. Take out everything that's not important to narrow down the problem. Compare it with the example. Figure out if any difference are making a difference. Note that not all parts in REBOUND are parallelized.

Francyrad commented 1 year ago

Sure, i'll try to take out all the non necessary part and i'll do a comparision with the example

Francyrad commented 1 year ago

I tried the OpenMP example many times and only once i got a scaling of 2.7 (10 max) openMP works, but i don't understand why i can't get the scaling, and i'm not running other operations in background... What could be the problem? This is why i'd love to have mpi for rebound, because it will be sure that i can use all my processors... At least this is my personal case that i encountered with other programs

hannorein commented 1 year ago

Compare it with the example that works. Figure out what the differences are. Note that not all parts in REBOUND are parallelized.

Francyrad commented 1 year ago

I tried to run the code with my old intel computer and it is able to scale, also my code. I did some research and, as I can see, it's an ARM64 with some chips problem that do not let you scale OpenMP, also other people met this problem...

The chips involved are: M1Pro (mine), M1Ultra, M1MAX. Only M1 avoid the issue because the chip is older, this is why you got the scaling.

unless OpenMP developers fix this, it is not possible to run the code in parallel. This is why i'd love MPI support for rebound in the future. In my experience i never met problem with other programs.

Thank you professor for your support, i'll close the isse when the OpenMP problem is solved Francesco

hannorein commented 1 year ago

If that's the case, it's not a REBOUND issue. So I'm closing this.