uwsampa / grappa

Grappa: scaling irregular applications on commodity clusters
grappa.io
BSD 3-Clause "New" or "Revised" License
159 stars 51 forks source link

Build problem with MPI library #198

Open rfvander opened 9 years ago

rfvander commented 9 years ago

I downloaded Grappa and am now trying to build it, but instructions are a bit sparse. If I define symbols CC and CXX to resolve to the Intel compilers icc and icpc, respectively, I get the error message below. Obviously, my installed MPI cannot be found. I tried to fix that by setting: “export MPICC=mpiicc” but that did not work, nor did “export MPI_C=mpiicc”. There is no reference to MPI in “configure” or in “FindPackageHandleStandardArgs.cmake “. Do you have any suggestions? By the way, I also have GASNet installed, so if that is the better communication layer, I'll use that--if I can get some instructions how to do that. Thanks.

Rob

[rfvander@bar1 grappa]$ export CC=icc [rfvander@bar1 grappa]$ export CXX=icpc [rfvander@bar1 grappa]$ ./configure --gen=Make --mode=Release cmake /lustre/home/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=33554432 -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DBASE_C_COMPILER=icc -DBASE_CXX_COMPILER=icpc -DCMAKE_BUILD_TYPE=RelWithDebInfo -- The C compiler identification is Intel 15.0.0.20140723 -- The CXX compiler identification is Intel 15.0.0.20140723 -- Check for working C compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icc -- Check for working C compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icpc -- Check for working CXX compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icpc -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Boost found: 1.53.0 -- /usr CMake Error at /usr/share/cmake/Modules/ FindPackageHandleStandardArgs.cmake:108 (message): Could NOT find MPI_C (missing: MPI_C_LIBRARIES) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindMPI.cmake:587 (find_package_handle_standard_args) CMakeLists.txt:205 (find_package)

-- Configuring incomplete, errors occurred! See also "/lustre/home/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log".

rfvander commented 9 years ago

Hi Jacob,

Partial success, see below.

Rob

[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe W0109 15:10:28.329658 48050 Communicator.cpp:259] Adjusting to fit in target footprint: 367001 bytes W0109 15:10:28.329823 48050 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 419430 bytes W0109 15:10:28.329835 48050 Task.cpp:116] Adjusting to fit in target footprint: 445384 bytes I0109 15:10:28.330062 48050 Grappa.cpp:326] Footprint estimates:

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:04 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

I updated the grappa_run script to accept Intel MPI. If you do a pull from master, you should be able to run a command like

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe

and it will call mpiexec.hydra with the right flags.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69415785.

rfvander commented 9 years ago

And a little less expected (two times the number of ranks reported than are specified).

[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 -- applications/demos/hello_world.exe W0109 15:12:34.862522 48098 Communicator.cpp:259] Adjusting to fit in target footprint: 734003 bytes W0109 15:12:34.862697 48098 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 838860 bytes W0109 15:12:34.862715 48098 Task.cpp:116] Adjusting to fit in target footprint: 858261 bytes I0109 15:12:34.863019 48098 Grappa.cpp:326] Footprint estimates:

[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=1 -- applications/demos/hello_world.exe W0109 15:13:59.741778 48184 Communicator.cpp:259] Adjusting to fit in target footprint: 2936012 bytes W0109 15:13:59.741951 48184 Task.cpp:116] Adjusting to fit in target footprint: 4580706 bytes I0109 15:13:59.742089 48184 Grappa.cpp:326] Footprint estimates:

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:04 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

I updated the grappa_run script to accept Intel MPI. If you do a pull from master, you should be able to run a command like

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe

and it will call mpiexec.hydra with the right flags.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69415785.

nelsonje commented 9 years ago

Curious. I'll

You can pass additional arguments to mpiexec.hydra after the bare --, like this:

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe

With mpiexec.hydra, you should also be okay to run like this:

. ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -envall applications/demos/hello_world.exe

Some other MPIs don't make it easy to propagate environment variables to child processes, which is why we wrote the grappa_run script.

nelsonje commented 9 years ago

Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.

rfvander commented 9 years ago

Remember what your mother said about initializing all variables (in addition to that useless advice to eat your greens).

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:29 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69419110.

rfvander commented 9 years ago

Things actually get a little weirder when –ppn is specified. When I set ppn to 4, it appears to be ignored, but otherwise results look reasonable. When I set ppn to 1, an error occurs.

Rob

[rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f barhosts -envall applications/demos/hello_world.exe W0109 15:38:43.697203 64304 Communicator.cpp:259] Adjusting to fit in target footprint: 1468006 bytes W0109 15:38:43.697376 64304 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 1677721 bytes W0109 15:38:43.697387 64304 Task.cpp:116] Adjusting to fit in target footprint: 1684015 bytes I0109 15:38:43.697545 64304 Grappa.cpp:326] Footprint estimates:

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:29 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69419110.

nelsonje commented 9 years ago

Two things:

First, when you call mpiexec.hydra directly, -n is the total number of tasks/processes in the job. -ppn sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.

nelsonje commented 9 years ago

Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?

sysctl kernel.shmmax

On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result

kernel.shmmax = 12884901888
rfvander commented 9 years ago

Sigh, and I’ve only used –ppn 50K times before . Thanks, Jacob.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:54 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Two things:

First, when you call mpiexec.hydra directly, -n is the total number of tasks/processes in the job. -ppn sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422419.

nelsonje commented 9 years ago

I recommend eating your greens, too!

rfvander commented 9 years ago

Hi Jacob,

This is what I get: [rfvander@bar1 Stencil]$ sysctl kernel.shmmax kernel.shmmax = 33554432 So, indeed, much less than on your nodes.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:56 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?

sysctl kernel.shmmax

On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result

kernel.shmmax = 12884901888

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422637.

rfvander commented 9 years ago

Preaching to the choir, I’m a vegetarian ☺.

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 4:10 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

I recommend eating your greens, too!

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69424269.

rfvander commented 9 years ago

Hi Jacob,

Let’s follow up on two critical issues:

· insufficient shared memory; as I mentioned earlier, my system is configured with way less than you wrote. Unfortunately, I am not allowed to change that, so should probably move to another system. What worries me, though, is that I ran into issues when running hello_world. That should not require 300MB of shared memory.

· Since I need to move to another system where I will be behind a firewall, the Grappa build process will not be able to access git anymore during the build. Of course, you want to be able to build without external dependencies anyway. How difficult is it to change the build procedure to do that? Thanks!

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:56 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?

sysctl kernel.shmmax

On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result

kernel.shmmax = 12884901888

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422637.

bholt commented 9 years ago

See https://github.com/uwsampa/grappa/issues/199.

rfvander commented 9 years ago

Thanks!

From: Brandon Holt [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 12:33 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

See #199https://github.com/uwsampa/grappa/issues/199.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69814325.

nelsonje commented 9 years ago

I'm tracking the shared memory issue in #202. In the long term this will go away; in the short term I may be able to work around it.

Don't read too much into the 300MB for hello_world. Currently we've sized a bunch of default allocations for running larger jobs with thousands of threads per node on ~100 nodes. As we work to make our current alpha-qualtity research code more usable one of the things we'll be working on is scaling down to a single node, or even a laptop. That's tracked in #164, and is another one of my projects for the next few months.

nelsonje commented 9 years ago

As for the subject of this ticket:

When we last talked we had two problems:

  1. CMake couldn't find your MPI install. This appeared to have something to do with the way your Lustre shared filesystem was set up.
  2. Once we explicitly specified the MPI dependences, we ran into another problem, where Make couldn't seem to find /usr/lib64/libpthread.so.

It appears that you've made progress on one of both of these? What happened?

I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.

rfvander commented 9 years ago

Actually, applying Jeff’s solution of using MPI for both C compilers and MPI compilers did the trick.

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 1:39 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

As for the subject of this ticket:

When we last talked we had two problems:

  1. CMake couldn't find your MPI install. This appeared to have something to do with the way your Lustre shared filesystem was set up.
  2. Once we explicitly specified the MPI dependences, we ran into another problem, where Make couldn't seem to find /usr/lib64/libpthread.so.

It appears that you've made progress on one of both of these? What happened?

I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69825305.

rfvander commented 9 years ago

Hi Jacob,

Progress! But isn’t this then not the expected behavior?

Rob

[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe

I0120 11:41:15.513974 116400 Grappa.cpp:587]

Shared memory breakdown: node total: 125.712 GB locale shared heap total: 62.856 GB locale shared heap per core: 3.9285 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 0.982124 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 0.982125 GB free per locale: 44.8516 GB

free per core: 2.80322 GB

I0120 11:41:15.700206 116400 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 11:41:15.700273 116403 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 11:41:15.700273 116404 hello_world.cpp:34] Hello world from locale 0 core 4 I0120 11:41:15.700361 116401 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 11:41:15.700291 116402 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 11:41:15.700294 116405 hello_world.cpp:34] Hello world from locale 0 core 5 I0120 11:41:15.700399 116406 hello_world.cpp:34] Hello world from locale 0 core 6 I0120 11:41:15.700278 116407 hello_world.cpp:34] Hello world from locale 0 core 7 I0120 11:41:15.700284 116408 hello_world.cpp:34] Hello world from locale 0 core 8 I0120 11:41:15.700284 116409 hello_world.cpp:34] Hello world from locale 0 core 9 I0120 11:41:15.700285 116410 hello_world.cpp:34] Hello world from locale 0 core 10 I0120 11:41:15.700284 116411 hello_world.cpp:34] Hello world from locale 0 core 11 I0120 11:41:15.700284 116412 hello_world.cpp:34] Hello world from locale 0 core 12 I0120 11:41:15.700284 116414 hello_world.cpp:34] Hello world from locale 0 core 14 I0120 11:41:15.700285 116415 hello_world.cpp:34] Hello world from locale 0 core 15 I0120 11:41:15.700798 116413 hello_world.cpp:34] Hello world from locale 0 core 13

From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:54 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Two things:

First, when you call mpiexec.hydra directly, -n is the total number of tasks/processes in the job. -ppn sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422419.

nelsonje commented 9 years ago

You're saying that because it looks like it's running 16 processes on a single locale, rather than 4 processes each on 4 locales? Yes indeed.

It looks like you didn't specify a hostfile, but you did in a previous command listed here; was that intended?

rfvander commented 9 years ago

Hi Jacob,

Yes, sorry for being so terse ☺. I omitted specifying a hostfile intentionally.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 11:56 AM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

You're saying that because it looks like it's running 16 processes on a single locale, rather than 4 processes each on 4 locales? Yes indeed.

It looks like you didn't specify a hostfile, but you did in a previous command listed here; was that intended?

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70721212.

nelsonje commented 9 years ago

Please continue---still not quite sure I understand what you're asking.

Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?

(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)

rfvander commented 9 years ago

Hi Jacob,

My question was indeed that with –nnode=4 I should just see four processes, independent of –ppn. I am running on the scheduler-less non-production cluster.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Please continue---still not quite sure I understand what you're asking.

Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?

(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.

rfvander commented 9 years ago

And here are two attempts at multi-node runs. Note that the latter, using mpiexec.hydra, doesn’t produce any output.

Rob

[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile barhosts applications/demos/hello_world.exe

I0120 13:00:03.839092 90526 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB

free per core: 11.6072 GB

I0120 13:00:03.864444 90526 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:00:03.864507 90527 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:00:03.864765 90529 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:00:03.864955 90528 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:00:03.865277 45724 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:00:03.863034 50605 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:00:03.865530 45726 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:00:03.869408 85006 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:00:03.863055 50606 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:00:03.865452 45727 hello_world.cpp:34] Hello world from locale 3 core 15 I0120 13:00:03.869412 85007 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:00:03.863046 50607 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:00:03.865536 45725 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:00:03.869418 85008 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:00:03.863041 50608 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:00:03.869418 85009 hello_world.cpp:34] Hello world from locale 1 core 7 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -hostfile barhosts applications/demos/hello_world.exe [rfvander@bar1 Make+Release]$

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Please continue---still not quite sure I understand what you're asking.

Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?

(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.

rfvander commented 9 years ago

Hi Jacob,

While the scheduler issue gets sorted out I want to report an issue I found when trying to use the Intel compiler (C,C++, and MPI). This does not show up with gcc.

Rob

compilation aborted for /lustre/home/rfvander/grappa/system/Grappa.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/Grappa.cpp.o] Error 2 /lustre/home/rfvander/grappa/system/DelegateBase.hpp(105): error: expression must have a constant value static_assert(std::is_convertible< decltype(func()), T >(), ^ detected during instantiation of "auto Grappa::impl::call(Grappa::impl::Core={int16_t={short}}, F)->decltype(()) [with F=lambda []()->GlobalAddress]" at line 91 of "/lustre/home/rfvander/grappa/system/GlobalAllocator.hpp" compilation aborted for /lustre/home/rfvander/grappa/system/graph/TupleGraph.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/graph/TupleGraph.cpp.o] Error 2 compilation aborted for /lustre/home/rfvander/grappa/system/GlobalHashMap.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/GlobalHashMap.cpp.o] Error 2 make[1]: * [system/CMakeFiles/Grappa.dir/all] Error 2 make: *\ [all] Error 2

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Please continue---still not quite sure I understand what you're asking.

Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?

(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.

nelsonje commented 9 years ago

Okay, I get it now.

Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)

Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe

or use mpiexec.hydra directly like this:

source ../../util/env.sh
mpiexec.hydra  -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.

I suggest that you use mpiexec.hydra directly for now.

nelsonje commented 9 years ago

(and please open another issue for the Intel compiler problem so we don't clutter this one. Thanks!)

rfvander commented 9 years ago

Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:58:43.466963 92113 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB

free per core: 11.6072 GB

I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:59:47.983866 92180 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB

free per core: 11.6541 GB

I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Okay, I get it now.

Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)

Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe

or use mpiexec.hydra directly like this:

source ../../util/env.sh

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.

I suggest that you use mpiexec.hydra directly for now.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70738340.

rfvander commented 9 years ago

Hi Jacob,

While the scheduler issue gets sorted out I want to report an issue I found when trying to use the Intel compiler (C,C++, and MPI). This does not show up with gcc.

Rob

compilation aborted for /lustre/home/rfvander/grappa/system/Grappa.cpp (code 2) make[2]: *\ [system/CMakeFiles/Grappa.dir/Grappa.cpp.o] Error 2 /lustre/home/rfvander/grappa/system/DelegateBase.hpp(105): error: expression must have a constant value static_assert(std::is_convertible< decltype(func()), T >(), ^ detected during instantiation of "auto Grappa::impl::call(Grappa::impl::Core={int16_t={short}}, F)->decltype(()) [with F=lambda []()->GlobalAddress]" at line 91 of "/lustre/home/rfvander/grappa/system/GlobalAllocator.hpp"

compilation aborted for /lustre/home/rfvander/grappa/system/graph/TupleGraph.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/graph/TupleGraph.cpp.o] Error 2 compilation aborted for /lustre/home/rfvander/grappa/system/GlobalHashMap.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/GlobalHashMap.cpp.o] Error 2 make[1]: * [system/CMakeFiles/Grappa.dir/all] Error 2 make: * [all] Error 2

nelsonje commented 9 years ago

Great!

As a matter of curiosity, what does the output look like if you run that without the -ppn argument? i.e., mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_ world.exe

On Tue, Jan 20, 2015 at 2:01 PM, rfvander notifications@github.com wrote:

Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:58:43.466963 92113 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB

free per core: 11.6072 GB

I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:59:47.983866 92180 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB

free per core: 11.6541 GB

I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Okay, I get it now.

Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)

Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe

or use mpiexec.hydra directly like this:

source ../../util/env.sh

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.

I suggest that you use mpiexec.hydra directly for now.

— Reply to this email directly or view it on GitHub< https://github.com/uwsampa/grappa/issues/198#issuecomment-70738340>.

— Reply to this email directly or view it on GitHub https://github.com/uwsampa/grappa/issues/198#issuecomment-70743090.

rfvander commented 9 years ago

Then we have a bit of a problem, but –ppn 1 fixes that.

[rfvander@bar1 Make+Release]$ mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 14:14:24.428670 92567 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 3.93241 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 0.983101 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 0.983102 GB free per locale: 44.8993 GB

free per core: 2.80621 GB

I0120 14:14:24.598052 92567 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:14:24.598104 92578 hello_world.cpp:34] Hello world from locale 0 core 11 I0120 14:14:24.598104 92581 hello_world.cpp:34] Hello world from locale 0 core 14 I0120 14:14:24.598232 92575 hello_world.cpp:34] Hello world from locale 0 core 8 I0120 14:14:24.598458 92580 hello_world.cpp:34] Hello world from locale 0 core 13 I0120 14:14:24.598603 92571 hello_world.cpp:34] Hello world from locale 0 core 4 I0120 14:14:24.598901 92577 hello_world.cpp:34] Hello world from locale 0 core 10 I0120 14:14:24.599145 92573 hello_world.cpp:34] Hello world from locale 0 core 6 I0120 14:14:24.599889 92579 hello_world.cpp:34] Hello world from locale 0 core 12 I0120 14:14:24.600147 92582 hello_world.cpp:34] Hello world from locale 0 core 15 I0120 14:14:24.600280 92570 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 14:14:24.600425 92572 hello_world.cpp:34] Hello world from locale 0 core 5 I0120 14:14:24.600585 92576 hello_world.cpp:34] Hello world from locale 0 core 9 I0120 14:14:24.600649 92568 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 14:14:24.600754 92574 hello_world.cpp:34] Hello world from locale 0 core 7 I0120 14:14:24.600850 92569 hello_world.cpp:34] Hello world from locale 0 core 2 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 14:14:36.273977 92621 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB

free per core: 11.6541 GB

I0120 14:14:36.299289 92621 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:14:36.299342 92622 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 14:14:36.299350 92623 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 14:14:36.299350 92624 hello_world.cpp:34] Hello world from locale 0 core 3 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 1 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 14:16:05.243644 92676 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 62.9185 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 15.7296 GB aggregator per core: 0.190094 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 15.7296 GB free per locale: 46.8581 GB

free per core: 46.8581 GB

I0120 14:16:05.254698 92676 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:16:05.253454 47751 hello_world.cpp:34] Hello world from locale 3 core 3 I0120 14:16:05.258024 86928 hello_world.cpp:34] Hello world from locale 1 core 1 I0120 14:16:05.254014 52561 hello_world.cpp:34] Hello world from locale 2 core 2

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 2:14 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Great!

As a matter of curiosity, what does the output look like if you run that without the -ppn argument? i.e., mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_ world.exe

On Tue, Jan 20, 2015 at 2:01 PM, rfvander notifications@github.com wrote:

Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:58:43.466963 92113 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB

free per core: 11.6072 GB

I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

I0120 13:59:47.983866 92180 Grappa.cpp:587]

Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB

free per core: 11.6541 GB

I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Okay, I get it now.

Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)

Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:

bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe

or use mpiexec.hydra directly like this:

source ../../util/env.sh

mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe

Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.

I suggest that you use mpiexec.hydra directly for now.

— Reply to this email directly or view it on GitHub< https://github.com/uwsampa/grappa/issues/198#issuecomment-70738340>.

— Reply to this email directly or view it on GitHub https://github.com/uwsampa/grappa/issues/198#issuecomment-70743090.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70745203.

rfvander commented 9 years ago

Hi Jacob,

I am now trying to build your implementation of synch_p2p using the uts example in the grappa repo as an example. However, uts as described in README-Grappa.md does not build.

  1.  There is no file called Makefile in the uts directory
  2.  When I ask to use Makefile.uts, which is present, the grappa target cannot be found

Then I tried another example, sort, which doesn’t have a Makefile. Then I looked at isopath, which has a grappa subdirectory with a Makefile. Typing make there produced the following:

[rfvander@bar1 grappa]$ make

Makefile:9: //include.mk: No such file or directory

Makefile:41: //system/Makefile: No such file or directory

Makefile:78: warning: overriding recipe for target `run'

Makefile:75: warning: ignoring old recipe for target `run'

make: *\ No rule to make target `//system/Makefile'. Stop.

Perhaps it is time for a little primer how to build a grappa application? Thanks.

Rob

rfvander commented 9 years ago

Hello Jacob,

While the build problem is now resolved on my research cluster, I am having continued problems with building on my production cluster. It does not have access to the Internet, so I downloaded the third party packages and built using –no-download. I also specify all the compilers in the same way as on my research cluster, but I keep getting error messages. As you can see (I added the definition of environment variables that I set before building), Cmake cannot find MPI_CXX or MPI_CXX_LIBRARIES, even though these variables are explicitly defined. Could you give me an idea how to work around this problem? Ultimately, I want to compare timings, and I won’t be able to do that on our research cluster. Thanks. BTW, I am little puzzled by the build output that Boost could not be found and that it is downloading that. Probably it’s innocuous, but you may want to change that warning.

Rob

[rfvander@eln4 grappa]$ \rm -rf build/ [rfvander@eln4 grappa]$ ./configure --no-downloads cmake /panfs/panfs3/users3/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=91239737344 -DNO_DOWNLOADS=true -DCMAKE_C_COMPILER=mpigcc -DCMAKE_CXX_COMPILER=mpigxx -DBASE_C_COMPILER=mpigcc -DBASE_CXX_COMPILER=mpigxx -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBOOST_ROOT=/sampa/share/gcc-4.7.2/src/boost_1_51_0 -- The C compiler identification is GNU 4.4.7 -- The CXX compiler identification is GNU 4.4.7 -- Check for working C compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc -- Check for working C compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigxx -- Check for working CXX compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigxx -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Boost not found. !! Will download and build Boost, which may take a while. -- Found MPI_C: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc CMake Error at /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindPackageHandleStandardArgs.cmake:136 (message): Could NOT find MPI_CXX (missing: MPI_CXX_LIBRARIES) Call Stack (most recent call first): /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindPackageHandleStandardArgs.cmake:343 (_FPHSA_FAILURE_MESSAGE) /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindMPI.cmake:611 (find_package_handle_standard_args) CMakeLists.txt:205 (find_package)

-- Configuring incomplete, errors occurred! See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log". [rfvander@eln4 grappa]$ history | grep export | tail -10 879 export MPI_C_COMPILER=mpigcc 952 export CC=mpigcc 953 export CXX=mpigxx 954 export MPI_C_COMPILER=mpigcc 955 export MPI_CXX_COMPILER=mpigxx 993 export CC=mpigcc; export CXX=mpigxx; export MPI_C_COMPILER=mpigcc; export MPI_CXX_COMPILER=mpigxx 1002 h | grep export 1020 export MPI_CXX_LIBRARIES=/opt/intel/impi/5.0.1.035/intel64/lib 1052 export MPI_CXX=mpigxx 1055 history | grep export | tail -10

From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 1:39 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

As for the subject of this ticket:

When we last talked we had two problems:

  1. CMake couldn't find your MPI install. This appeared to have something to do with the way your Lustre shared filesystem was set up.
  2. Once we explicitly specified the MPI dependences, we ran into another problem, where Make couldn't seem to find /usr/lib64/libpthread.so.

It appears that you've made progress on one of both of these? What happened?

I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69825305.

rfvander commented 9 years ago

This issue is still open for me, unfortunately. The only grappa codes I have been able to build are integrated in your package, and as such are not a model for what an application developer would do. Could you send me a simple example: a tar with just an example makefile and a source code? Thanks.

Rob

From: Van Der Wijngaart, Rob F Sent: Wednesday, January 21, 2015 11:36 AM To: 'uwsampa/grappa' Subject: Building my own grappa application

Hi Jacob,

I am now trying to build your implementation of synch_p2p using the uts example in the grappa repo as an example. However, uts as described in README-Grappa.md does not build.

  1.  There is no file called Makefile in the uts directory
  2.  When I ask to use Makefile.uts, which is present, the grappa target cannot be found

Then I tried another example, sort, which doesn’t have a Makefile. Then I looked at isopath, which has a grappa subdirectory with a Makefile. Typing make there produced the following:

[rfvander@bar1 grappa]$ make

Makefile:9: //include.mk: No such file or directory

Makefile:41: //system/Makefile: No such file or directory

Makefile:78: warning: overriding recipe for target `run'

Makefile:75: warning: ignoring old recipe for target `run'

make: *\ No rule to make target `//system/Makefile'. Stop.

Perhaps it is time for a little primer how to build a grappa application? Thanks.

Rob

nelsonje commented 9 years ago

Hi Rob,

We're taking a moment to remove some complexity from our build system before updating the docs with details on adding new code. I'll get back to you shortly.

rfvander commented 9 years ago

Great, thanks, Jacob. I hope you’re not getting frustrated with all my questions, and hope that the result of all of this will be that Grappa will be easier to use for everybody.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Thursday, January 22, 2015 2:05 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Hi Rob,

We're taking a moment to remove some complexity from our build system before updating the docs with details on adding new code. I'll get back to you shortly.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71108639.

nelsonje commented 9 years ago

Not at all! It's immensely helpful. I just hope I can make progress fast enough to keep you interested while not neglecting my other responsibilities. :-)

Can we schedule some screen-sharing time to debug the MPI problem?

rfvander commented 9 years ago

Absolutely! I’ll send an invite if you give me an indication of your availability. Thanks, Jacob.

From: Jacob Nelson [mailto:notifications@github.com] Sent: Thursday, January 22, 2015 2:53 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Not at all! It's immensely helpful. I just hope I can make progress fast enough to keep you interested while not neglecting my other responsibilities. :-)

Can we schedule some screen-sharing time to debug the MPI problem?

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71116002.

nelsonje commented 9 years ago

After further debugging, we've determined that this MPI detection error is due to a bug in the Intel mpicc wrapper script---in versions prior to 5.0.2 it doesn't propagate errors from the underlying compiler, which confuses CMake's MPI detection script.

I see three ways to solve this now: 1) Use a newer version of Intel MPI; version 5.0.2 should work fine. (I was using 5.0.2.044.)

2) The CMake folks have also recently added code to solve this problem, which should be available in CMake verison 3.2; here's the bug report: https://public.kitware.com/Bug/view.php?id=15182 That version of CMake is still in development; you could potentially try downloading and building from their trunk, but that would be a pain.

3) Since we want to use GCC with Intel MPI, it ought to work to point CMake at the GCC wrapper scripts directly like this:

CC=gcc CXX=g++ ; ./configure -- -DMPI_C_COMPILER=mpigcc -DMPI_CXX_COMPILER=mpigxx

This works for me when I hack my mpicc script and works for one of the users in the CMake bug report, but it could behave differently on your system if something else is also going on. Note that gcc/g++ here must be at least version 4.7.2.

rfvander commented 9 years ago

Thanks, Jacob. I could confirm that the proper error propagation does work for MPI version 5.0.2, and not for the version I was using earlier. The difference is in the mpigcc scripts, not mpicc. So I am pointing to the newer MPI now. I’d like to note, though, that we ultimately want to link with the Intel compilers, not GNU.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 12:48 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

After further debugging, we've determined that this MPI detection error is due to a bug in the Intel mpicc wrapper script---in versions prior to 5.0.2 it doesn't propagate errors from the underlying compiler, which confuses CMake's MPI detection script.

I see three ways to solve this now: 1) Use a newer version of Intel MPI; version 5.0.2 should work fine. (I was using 5.0.2.044.)

2) The CMake folks have also recently added code to solve this problem, which should be available in CMake verison 3.2; here's the bug report: https://public.kitware.com/Bug/view.php?id=15182 That version of CMake is still in development; you could potentially try downloading and building from their trunk, but that would be a pain.

3) Since we want to use GCC with Intel MPI, it ought to work to point CMake at the GCC wrapper scripts directly like this:

CC=gcc CXX=g++ ; ./configure -- -DMPI_C_COMPILER=mpigcc -DMPI_CXX_COMPILER=mpigxx

This works for me when I hack my mpicc script and works for one of the users in the CMake bug report, but it could behave differently on your system if something else is also going on. Note that gcc/g++ here must be at least version 4.7.2.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71535046.

nelsonje commented 9 years ago

Great! When you're able to build a run a binary we can close the ticket.

As for using the Intel compiler, I'll track that in #205.

rfvander commented 9 years ago

Sadly, while configure now breezed through, the build failed. Here is the end of the build output.

Rob

common.copy /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/third-party/lib/libboost_prg_exec_monitor.a gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/exception_safety.o gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/interaction_based.o gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/logged_expectations.o gcc.archive bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/libboost_unit_test_framework.a common.copy /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/third-party/lib/libboost_unit_test_framework.a ...updated 10706 targets... [ 30%] No install step for 'third-party-boost' [ 30%] Completed 'third-party-boost' [ 30%] Built target third-party-boost Scanning dependencies of target all-third-party [ 30%] Built target all-third-party Scanning dependencies of target graph500-generator Scanning dependencies of target Communicator [ 30%] Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/graph_generator.c.o [ 35%] [ 35%] [ 35%] Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/make_graph.c.o Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/splittable_mrg.c.o Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/utils.c.o [ 35%] Building CXX object system/CMakeFiles/Communicator.dir/Communicator.cpp.o [ 38%] Building CXX object system/CMakeFiles/Communicator.dir/LocaleSharedMemory.cpp.o cc1plus: error: unrecognized command line option "-std=c++11" make[2]: * [system/CMakeFiles/Communicator.dir/Communicator.cpp.o] Error 1 make[2]: * Waiting for unfinished jobs.... cc1plus: error: unrecognized command line option "-std=c++11" make[2]: * [system/CMakeFiles/Communicator.dir/LocaleSharedMemory.cpp.o] Error 1 make[1]: * [system/CMakeFiles/Communicator.dir/all] Error 2 make[1]: * Waiting for unfinished jobs.... Linking C static library libgraph500-generator.a [ 38%] Built target graph500-generator make: * [all] Error 2 [rfvander@eln4 Make+Release]$

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 1:51 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Great! When you're able to build a run a binary we can close the ticket.

As for using the Intel compiler, I'll track that in #205https://github.com/uwsampa/grappa/issues/205.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71545776.

nelsonje commented 9 years ago

Would you verify that your GCC version is >= 4.7.2 with gcc --version?

rfvander commented 9 years ago

It isn’t, just checked, so I’ll move to 4.9, which is available in the corner of my system.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 2:03 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Would you verify that your GCC version is >= 4.7.2 with gcc --version?

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71548067.

rfvander commented 9 years ago

Sigh. This is what happens when I upgrade to gcc v 4.9 and use the latest MPI compiler:

[rfvander@eln4 grappa]$ ./configure --no-downloads cmake /panfs/panfs3/users3/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=91239737344 -DNO_DOWNLOADS=true -DCMAKE_C_COMPILER=mpigcc -DCMAKE_CXX_COMPILER=mpigxx -DBASE_C_COMPILER=mpigcc -DBASE_CXX_COMPILER=mpigxx -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBOOST_ROOT=/sampa/share/gcc-4.7.2/src/boost_1_51_0 -- The C compiler identification is unknown -- The CXX compiler identification is unknown -- Check for working C compiler: /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -- Check for working C compiler: /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -- broken CMake Error at /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/CMakeTestCCompiler.cmake:61 (message): The C compiler "/opt/intel/impi/5.0.2.044/intel64/bin/mpigcc" is not able to compile a simple test program. It fails with the following output: Change Dir: /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/gmake" "cmTryCompileExec2839858443/fast" /usr/bin/gmake -f CMakeFiles/cmTryCompileExec2839858443.dir/build.make CMakeFiles/cmTryCompileExec2839858443.dir/build gmake[1]: Entering directory `/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp'

/opt/crtdc/cmake/3.0.2/bin/cmake -E cmake_progress_report /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building C object CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -o CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o -c /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp/testCCompiler.c

/opt/crtdc/gcc/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/cc1: error while loading shared libraries: libmpc.so.3: cannot open shared object file: No such file or directory gmake[1]: *\ [CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o] Error 1 gmake[1]: Leaving directory `/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp'

gmake: *\ [cmTryCompileExec2839858443/fast] Error 2

CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:19 (project)

-- Configuring incomplete, errors occurred! See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log". See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeError.log".

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 2:03 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Would you verify that your GCC version is >= 4.7.2 with gcc --version?

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71548067.

nelsonje commented 9 years ago

Would you verify that you can build a simple plain C program with GCC 4.9, and a MPI program with mpigcc and GCC 4.9? The library it's complaining about is part of GCC, so if mpigcc can't find GCC's library include paths we would expect this sort of error.

nelsonje commented 9 years ago

Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.

rfvander commented 9 years ago

Right, that’s the problem. I’ve poked around, but nothing compiles with this version of gcc on our system. I’m asking the admins to install a new version, or patch up the one we have.

Rob

[rfvander@eln4 Transpose]$ more test.c int main(int argc, char**argv){ int i; } [rfvander@eln4 Transpose]$ gcc test.c /opt/crtdc/gcc/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/cc1: error while loading shared libraries: libmpc.so.3: cannot open shared object file: No such file or directory

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 3:20 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Would you verify that you can build a simple plain C program with GCC 4.9, and a MPI program with mpigcc and GCC 4.9? The library it's complaining about is part of GCC, so if mpigcc can't find GCC's library include paths we would expect this sort of error.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71559690.

rfvander commented 9 years ago

Will do, thanks.

Rob

From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 11:36 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71601727.

rfvander commented 9 years ago

OK, Jacob, progress on my production cluster. It turns out that not all compiler dependences were set. I won’t bore you with the details, but suffice it to say that after correcting that, and after pulling the new bits, I could configure and make grappa (of course, I also needed to do the no-downloads hack). But I could not build hello_world. Error log attached.

[rfvander@eln4 Make+Release]$ make demo-hello_world 2> error.log [ 10%] Built target third-party-gflags [ 20%] Built target third-party-boost [ 30%] Built target third-party-glog [ 30%] Built target all-third-party [ 35%] Built target graph500-generator [ 97%] Built target Grappa Linking CXX executable helloworld.exe [rfvander@eln4 Make+Release]$ wc error.log 1656 10181 222660 error.log [rfvander@eln4 Make+Release]$ grep -10 world.exe error.log /panfs/panfs3/users3/rfvander/grappa/system/tasks/TaskingScheduler.hpp:471: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int)' ../../system/libGrappa.a(GlobalMemory.cpp.o): In functionLinear': /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:172: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int, google::CheckOpString const&)' /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:172: undefined reference togoogle::LogMessageFatal::~LogMessageFatal()' /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:173: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int, google::CheckOpString const&)' /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:173: undefined reference togoogle::LogMessageFatal::~LogMessageFatal()' ../../system/libGrappa.a(GlobalMemory.cpp.o): In function Allocator': /panfs/panfs3/users3/rfvander/grappa/system/Allocator.hpp:172: undefined reference togoogle::LogMessageFatal::LogMessageFatal(char const, int)' /panfs/panfs3/users3/rfvander/grappa/system/Allocator.hpp:172: undefined reference to `google::LogMessageFatal::~LogMessageFatal()' collect2: error: ld returned 1 exit status make[3]: _\ [applications/demos/hello_world.exe] Error 1 make[2]: * [applications/demos/CMakeFiles/demo-hello_world.dir/all] Error 2 make[1]: * [applications/demos/CMakeFiles/demo-hello_world.dir/rule] Error 2 make: *\ [demo-hello_world] Error 2 From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 11:36 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)

Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.

— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71601727.