teuben / nemo

a Stellar Dynamics Toolbox (Not Everybody Must Observe)
https://astronemo.readthedocs.io
GNU General Public License v2.0
56 stars 40 forks source link

Running example commands produces error for "halo.c" #104

Closed ealione closed 1 year ago

ealione commented 1 year ago

I have built the project on a "Linux 4.15.0-193-generic #204-Ubuntu" machine, following the exact instructions outlined in the README, without any apparent errors. My goal is to see run perf on it and see what is the most active piece of code (hot kernel).

I thought I should start with running a few of the examples, including:

and the Orbit example.

mkplummer p10.dat 10

I'm constantly getting this error:

`~/nemo$ mkplummer p10.dat 10

nemo Debug Info: [bodytrans_new: invoking cc +saving .o]

Fatal error [mkplummer]: bodytrans(): could not compile expr=r2

`

teuben commented 1 year ago

If you run:

  mkplummer p10.dat 10 debug-9

you shiould see a lot of output, and one of the lines I get to see is this:

### nemo Debug Info: [loadobjDL.c:41]: loadobj: /home/teuben/NEMO/nemo/obj/bodytrans/btr_r2.so

Perhaps during the install the btr_r2.c did not compile properly? Can you check this in the directory

 $NEMO/src/nbody/cores/bodysub/

and try

     make btr_r2.so

i suspect this failed for some reason.

One reason perhaps could be that you have a somewhat old system, the 4.15.0 kernel seems old to me. Is that Ubuntu 14 or 16? Should still work though,

ealione commented 1 year ago

Good news, and bad news. There was no Makefile in the directory you indicated. So I decided to re compile nemo again. This time I'm getting some additional errors;

`~/nemo$ source nemo_start.sh :~/nemo$ mkplummer p10.dat 10 debug-9

Fatal error [mkplummer]: getdparam(mlow=debug-9) parsing error -12, assumed 0.999

:~/nemo$ rotcurves name1=halo pars1=0,1,1 radii=0:8:0.1

Fatal error [rotcurves]: get_potential: no potential halo.c found

:~/nemo$ mkorbit - 1 0 0 0 1 0 potname=halo | orbint - - 10000 0.01 | orbplot -

nemo Debug Info: Dvel=-1

Fatal error [mkorbit]: get_potential: no potential halo.c found

Fatal error [orbint]: error in reading input orbit

nemo Debug Info: [bodytrans_new: invoking cc +saving .o]

Fatal error [orbplot]: bodytrans(): could not compile expr=x

`

teuben commented 1 year ago

Can you tell me which Ubuntu this is

On Fri, Oct 7, 2022, 22:10 ealiaj @.***> wrote:

Good news, and bad news. There was no Makefile in the directory you indicated. So I decided to re compile nemo again. This time I'm getting some additional errors;

`/nemo$ source nemo_start.sh :/nemo$ mkplummer p10.dat 10 debug-9 Fatal error [mkplummer]: getdparam(mlow=debug-9) parsing error -12, assumed 0.999

:~/nemo$ rotcurves name1=halo pars1=0,1,1 radii=0:8:0.1 Fatal error [rotcurves]: get_potential: no potential halo.c found

:~/nemo$ mkorbit - 1 0 0 0 1 0 potname=halo | orbint - - 10000 0.01 | orbplot - nemo Debug Info: Dvel=-1 Fatal error [mkorbit]: get_potential: no potential halo.c found Fatal error [orbint]: error in reading input orbit nemo Debug Info: [bodytrans_new: invoking cc +saving .o] Fatal error [orbplot]: bodytrans(): could not compile expr=x

`

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1272197400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGNH76TOIENH6IFKKBLWCDJ2BANCNFSM6AAAAAAQ775GS4 . You are receiving this because you commented.Message ID: @.***>

teuben commented 1 year ago

btw, there is a Makefile in $NEMO/src/nbody/cores/bodysub/ for sure. I suspect in that directory something didn't install right. You should do a "make clean install" in that directory and carefully watch any possible error messages. The procedure in that directory hasn't changed in 10+ years (legacy software!)

ealione commented 1 year ago

Hi Teuben, this is my exact version of Ubuntu

LSB Version: core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch Distributor ID: Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 Codename: bionic

After running /configure I'm afraid this is all I see in the directory:

~$ cd nemo/src/nbody/cores/bodysub/ :~/nemo/src/nbody/cores/bodysub$ ls BTclean bti_1.c bti_key.c btr_0.c btr_ar.c btr_ax.c btr_az.c btr_dens.c btr_eps.c btr_glat.c btr_i.c btr_jx.c btr_jz.c btr_m.c btr_mul.c btr_r2.c btr_r.c btr_v2.c btr_vp.c btr_vr.c btr_vt.c btr_vy.c btr_x.c btr_y.c btr_z.c bti_0.c bti_i.c BTNAMES btr_1.c btr_aux.c btr_ay.c btr_dec.c btr_ekin.c btr_etot.c btr_glon.c btr_jtot.c btr_jy.c btr_key.c btr_mub.c btr_phi.c btr_ra.c btr_t.c btr_v.c btr_vr2.c btr_vt2.c btr_vx.c btr_vz.c btr_xsky.c btr_ysky.c Makefile

teuben commented 1 year ago

After you did the 'make build' , in that directory there will be s lot of .so files. If not, look at install.log

Are there no complains during the configure step?

On Mon, Oct 10, 2022, 13:43 ealiaj @.***> wrote:

Hi Teuben, this is my exact version of Ubuntu

LSB Version: core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch Distributor ID: Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 Codename: bionic

After running /configure I'm afraid this is all I see in the directory:

~$ cd nemo/src/nbody/cores/bodysub/ :~/nemo/src/nbody/cores/bodysub$ ls BTclean bti_1.c bti_key.c btr_0.c btr_ar.c btr_ax.c btr_az.c btr_dens.c btr_eps.c btr_glat.c btr_i.c btr_jx.c btr_jz.c btr_m.c btr_mul.c btr_r2.c btr_r.c btr_v2.c btr_vp.c btr_vr.c btr_vt.c btr_vy.c btr_x.c btr_y.c btr_z.c bti_0.c bti_i.c BTNAMES btr_1.c btr_aux.c btr_ay.c btr_dec.c btr_ekin.c btr_etot.c btr_glon.c btr_jtot.c btr_jy.c btr_key.c btr_mub.c btr_phi.c btr_ra.c btr_t.c btr_v.c btr_vr2.c btr_vt2.c btr_vx.c btr_vz.c btr_xsky.c btr_ysky.c Makefile

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1273636571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGNL4EC4AL5JRUYET2TWCRIVHANCNFSM6AAAAAAQ775GS4 . You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

It seems that things indeed are not going as expected. I had a look at install.log.

I am executing the following steps:

pasted here to save some space: https://pastebin.com/eCeeR89z

And then having a look at install.log, a fairly large file I get indeed a few errors.

stored here due to size: https://drive.google.com/file/d/1hQf4yUmontufjHFXkHyPfE37ffLMXu8P/view?usp=sharing

teuben commented 1 year ago

the output of configure looks ok.  But from the install.log file I can see it cannot find the command ldso.

This is a script that gets installed from $NEMO/src/scripts    - you can manually go into that directory and type

     make install

then go into $NEMO/src/nbody/cores/bodysub and do the same

    make install

and now you should not more complaints that ldso cannot be found.

After this, for sanity, just reinstall the whole system with this

    cd $NEMO

    make rebuild

and then

    make check

and hopefully most entries are now "OK".

On 10/10/22 15:39, ealiaj wrote:

It seems that things indeed are not going as expected. I had a look at |install.log|.

I am executing the following steps:

pasted here to save some space: https://pastebin.com/eCeeR89z

And then having a look at |install.log|, a fairly large file I get indeed a few errors.

stored here due to size: https://drive.google.com/file/d/1hQf4yUmontufjHFXkHyPfE37ffLMXu8P/view?usp=sharing

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1273733865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGNUES4VCYMMNAUZOCLWCRWHJANCNFSM6AAAAAAQ775GS4. You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

I am not sure what's the reason for the issues, maybe there is something different with my system. But it seems that, most probably, the other issues span from this one:

~/nemo/src/scripts$ make install
Makefile:3: /makedefs: No such file or directory
make: *** No rule to make target '/makedefs'.  Stop.
ealione commented 1 year ago

Not entirely sure why, but running the installation script provided, as described in the projects readthedocs page, instead of performing the process manually, worked fine.

teuben commented 1 year ago

There is also an abbreviated version of the install on the README.md page in the main repo, which procedure did you follow the first time? Perhaps you forgot a step. Might be good to see if you can reproduce it. I do this so often, that I know it works. But it's easy to forget the "source nemo_start.sh" line if you want to use NEMO in the shell, instead of the installation via the Makefile

ealione commented 1 year ago

What I tried executing was the following:

./configure --with-yapp=pgplot --without-csh
 make build check bench5
 source nemo_start.sh

btw, what would be a long running example. As I said I want to profile the application and have a look at what is the most important (frequently called function). Usually most examples I had a chance to test, were extremely fast though.

teuben commented 1 year ago

You really should install tcsh  (sudo apt install tcsh), otherwise you cannot use the "mknemo" command to conveniently recompile programs, e.g. where ever you are , the command

      mknemo tsf

would recompile that program.

A good test for a longer run (how long you want) is this:

   cd $NEMO/scripts/csh

look at teh comments about a benchmark in the start of the file. One is about 2 mins on my laptop, the other 5.

On 10/11/22 16:43, ealiaj wrote:

What I tried executing was the following:

|./configure --with-yapp=pgplot --without-csh make build check bench5 source nemo_start.sh |

btw, what would be a long running example. As I said I want to profile the application and have a look at what is the most important (frequently called function). Usually most examples I had a chance to test, were extremely fast though.

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1275254603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGJDEOJOPRUIFI22OC3WCXGQ7ANCNFSM6AAAAAAQ775GS4. You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

Yup, I realized that tcsh is a must in the end. OK thank you, I will have a look at the benchmark you suggested.

teuben commented 1 year ago

What was the result of your

Make bench5

On Tue, Oct 11, 2022, 16:57 ealiaj @.***> wrote:

Yup, I realized that tcsh is a must in the end. OK thank you, I will have a look at the benchmark you suggested.

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1275268558, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGPUDKNN4SFKZXQWME3WCXIFJANCNFSM6AAAAAAQ775GS4 . You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

Here you can find the complete output during my latest build: https://pastebin.com/Aa1M3qXU

or do you mean the output from my unsuccessful attempt?

teuben commented 1 year ago

That looks quite allright now.

teuben commented 1 year ago

now that NEMO is running for you, what would you like to do with it. If you're looking for a challenge, I have some interesting issues to discuss.

ealione commented 1 year ago

For the most part I am interested in seeing if I can offload parts of nemo to a hardware accelerator as a benchmark for a toy compiler I am building. I am more than interested to hear what you might have in mind, but I'm afraid that my "astro"physics knowledge will disappoint.

teuben commented 1 year ago

The mysterious (what i think is) compiler error is described in #98

teuben commented 1 year ago

I've been playing with OpenMP but not had a lot of luck speeding things up significantly

ealione commented 1 year ago

So the way I see it, the function void potential_double (ndim, pos, acc, pot, time) is the most costly for operations like mkplummer with a large number of bodies. image

Because of the time variable it needs to run in a serial manner for each body so can't be parallelized there. But there is nothing stopping it from being run in parallel for all bodies, right?

There seem to be a whole lot of potential_double functions so I'm trying to see which one is used in this case and where, in order to check if there is any possibility for parallelizing the calls for each body.

teuben commented 1 year ago

I think this is the orbint program, not mkplummer. So in this case only 1 particle is integrated, But the integrate_rk4 is probably calling it,and it cannot run in parallel since one needs to wait for theother to continue....

An example of something that should speed up is something like snapscale.  Each particle is scaled (in pos, vel or whatever the user chooses), but all N particles are independant.

On 10/14/22 18:20, ealiaj wrote:

So the way I see it, the function |void potential_double (ndim, pos, acc, pot, time)| is the most costly for operations like |mkplummer| with a large number of bodies. image https://user-images.githubusercontent.com/5569899/195951775-dfab9229-49ce-47cd-a44c-bbe962a333ea.png

Because of the time variable it needs to run in a serial manner for each body so can't be parallelized there. But there is nothing stopping it from being run in parallel for all bodies, right?

There seem to be a whole lot of |potential_double| functions so I'm trying to see which one is used in this case and where, in order to check if there is any possibility for parallelizing the calls for each body.

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1279530717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGOV4AMOXDJTDETRXWDWDHMEFANCNFSM6AAAAAAQ775GS4. You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

Apologies, you are correct, it was obviously orbit. I tested a bunch of stuff. I'll have a look at snapscale

teuben commented 1 year ago

Since nemo is often run as follows:

    p1 | p2 | p3 | p4

where programs are in a unix pipe.  Even if p3 can be made parallel, but p2 cannot, the van Neumann bottlenect will kill the performance of this pipe :-(

On 10/14/22 18:51, ealiaj wrote:

Apologies, you are correct, it was obviously orbit. I tested a bunch of stuff. I'll have a look at |snapscale|

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1279558210, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGKTAWDKUED4LUCXJDDWDHPWZANCNFSM6AAAAAAQ775GS4. You are receiving this because you commented.Message ID: @.***>

ealione commented 1 year ago

It doesn't seem that I am able to find "snapscale' in the codebase. Additionally yes, I had a look at potential and it definitely is not parallelizable!

teuben commented 1 year ago

try the command mknemo snapscale and it will recompile, and tell you where it is

On Fri, Oct 14, 2022 at 8:11 PM ealiaj @.***> wrote:

It doesn't seem that I am able to find "snapscale' in the codebase. Additionally yes, I had a look at potential and it definitely is not parallelizable!

— Reply to this email directly, view it on GitHub https://github.com/teuben/nemo/issues/104#issuecomment-1279605423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4MGPZS752D6LZOSU4PZTWDHZBVANCNFSM6AAAAAAQ775GS4 . You are receiving this because you commented.Message ID: @.***>