Open rfvander opened 9 years ago
Hi Jacob,
Partial success, see below.
Rob
[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe W0109 15:10:28.329658 48050 Communicator.cpp:259] Adjusting to fit in target footprint: 367001 bytes W0109 15:10:28.329823 48050 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 419430 bytes W0109 15:10:28.329835 48050 Task.cpp:116] Adjusting to fit in target footprint: 445384 bytes I0109 15:10:28.330062 48050 Grappa.cpp:326] Footprint estimates:
I0109 15:10:28.342118 48050 hello_world.cpp:34] Hello world from locale 0 core 0 I0109 15:10:28.342149 48057 hello_world.cpp:34] Hello world from locale 0 core 7 I0109 15:10:28.342371 48053 hello_world.cpp:34] Hello world from locale 0 core 3 I0109 15:10:28.342686 48059 hello_world.cpp:34] Hello world from locale 0 core 9 I0109 15:10:28.342653 48064 hello_world.cpp:34] Hello world from locale 0 core 14 I0109 15:10:28.343147 48061 hello_world.cpp:34] Hello world from locale 0 core 11 I0109 15:10:28.343169 48051 hello_world.cpp:34] Hello world from locale 0 core 1 I0109 15:10:28.343266 48065 hello_world.cpp:34] Hello world from locale 0 core 15 I0109 15:10:28.343286 48055 hello_world.cpp:34] Hello world from locale 0 core 5 I0109 15:10:28.343364 48054 hello_world.cpp:34] Hello world from locale 0 core 4 I0109 15:10:28.343552 48060 hello_world.cpp:34] Hello world from locale 0 core 10 I0109 15:10:28.343544 48062 hello_world.cpp:34] Hello world from locale 0 core 12 I0109 15:10:28.343652 48056 hello_world.cpp:34] Hello world from locale 0 core 6 I0109 15:10:28.343675 48052 hello_world.cpp:34] Hello world from locale 0 core 2 I0109 15:10:28.343652 48058 hello_world.cpp:34] Hello world from locale 0 core 8 I0109 15:10:28.343750 48063 hello_world.cpp:34] Hello world from locale 0 core 13 F0109 15:10:28.352632 48050 Communicator.hpp:219] Check failed: sizeof(Grappa::impl::Deserializer) + sizeof(f) <= c->size (9 vs. 0) Immediate buffer size to small to contain 8-byte deserializer + 1-byte lambda * Check failure stack trace: * * Aborted at 1420845028 (unix time) try "date -d @1420845028" if you are using GNU date * PC: @ 0x7f8f0baba582 google::DumpStackTrace() @ 0x432641 Grappa::impl::failure_function() @ 0x7f8f0baad310 google::LogMessage::Fail() @ 0x7f8f0baad25d google::LogMessage::SendToLog() @ 0x7f8f0baacc7c google::LogMessage::Flush() @ 0x7f8f0baafb7d google::LogMessageFatal::~LogMessageFatal() @ 0x433037 Grappa_end_tasks() @ 0x42a150 _ZN6Grappa4implL18task_functor_proxyIZNS_3runIZ4mainEUlvE_EEvT_EUlvE_EEvmmm @ 0x452d53 Grappa::impl::workerLoop() @ 0x44f15e Grappa::impl::tramp() @ 0x4548ac (unknown) I0109 15:10:28.358675 48050 Grappa.cpp:251] Exiting via failure function APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:04 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
I updated the grappa_run script to accept Intel MPI. If you do a pull from master, you should be able to run a command like
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe
and it will call mpiexec.hydra with the right flags.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69415785.
And a little less expected (two times the number of ranks reported than are specified).
[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 -- applications/demos/hello_world.exe W0109 15:12:34.862522 48098 Communicator.cpp:259] Adjusting to fit in target footprint: 734003 bytes W0109 15:12:34.862697 48098 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 838860 bytes W0109 15:12:34.862715 48098 Task.cpp:116] Adjusting to fit in target footprint: 858261 bytes I0109 15:12:34.863019 48098 Grappa.cpp:326] Footprint estimates:
I0109 15:12:34.874004 48098 hello_world.cpp:34] Hello world from locale 0 core 0 I0109 15:12:34.874102 48100 hello_world.cpp:34] Hello world from locale 0 core 2 I0109 15:12:34.874066 48102 hello_world.cpp:34] Hello world from locale 0 core 4 I0109 15:12:34.874102 48103 hello_world.cpp:34] Hello world from locale 0 core 5 I0109 15:12:34.874094 48104 hello_world.cpp:34] Hello world from locale 0 core 6 I0109 15:12:34.874084 48105 hello_world.cpp:34] Hello world from locale 0 core 7 I0109 15:12:34.874090 48099 hello_world.cpp:34] Hello world from locale 0 core 1 I0109 15:12:34.874091 48101 hello_world.cpp:34] Hello world from locale 0 core 3 F0109 15:12:34.881544 48098 Communicator.hpp:219] Check failed: sizeof(Grappa::impl::Deserializer) + sizeof(f) <= c->size (9 vs. 0) Immediate buffer size to small to contain 8-byte deserializer + 1-byte lambda * Check failure stack trace: * * Aborted at 1420845154 (unix time) try "date -d @1420845154" if you are using GNU date * PC: @ 0x7fc946624582 google::DumpStackTrace() @ 0x432641 Grappa::impl::failure_function() @ 0x7fc946617310 google::LogMessage::Fail() @ 0x7fc94661725d google::LogMessage::SendToLog() @ 0x7fc946616c7c google::LogMessage::Flush() @ 0x7fc946619b7d google::LogMessageFatal::~LogMessageFatal() @ 0x433037 Grappa_end_tasks() @ 0x42a150 _ZN6Grappa4implL18task_functor_proxyIZNS_3runIZ4mainEUlvE_EEvT_EUlvE_EEvmmm @ 0x452d53 Grappa::impl::workerLoop() @ 0x44f15e Grappa::impl::tramp() @ 0x4548ac (unknown) I0109 15:12:34.888079 48098 Grappa.cpp:251] Exiting via failure function APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=1 -- applications/demos/hello_world.exe W0109 15:13:59.741778 48184 Communicator.cpp:259] Adjusting to fit in target footprint: 2936012 bytes W0109 15:13:59.741951 48184 Task.cpp:116] Adjusting to fit in target footprint: 4580706 bytes I0109 15:13:59.742089 48184 Grappa.cpp:326] Footprint estimates:
I0109 15:13:59.751731 48184 hello_world.cpp:34] Hello world from locale 0 core 0 I0109 15:13:59.751822 48185 hello_world.cpp:34] Hello world from locale 0 core 1
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:04 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
I updated the grappa_run script to accept Intel MPI. If you do a pull from master, you should be able to run a command like
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe
and it will call mpiexec.hydra with the right flags.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69415785.
Curious. I'll
You can pass additional arguments to mpiexec.hydra after the bare --
, like this:
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe
With mpiexec.hydra, you should also be okay to run like this:
. ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -envall applications/demos/hello_world.exe
Some other MPIs don't make it easy to propagate environment variables to child processes, which is why we wrote the grappa_run script.
Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.
Remember what your mother said about initializing all variables (in addition to that useless advice to eat your greens).
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:29 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69419110.
Things actually get a little weirder when –ppn is specified. When I set ppn to 4, it appears to be ignored, but otherwise results look reasonable. When I set ppn to 1, an error occurs.
Rob
[rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f barhosts -envall applications/demos/hello_world.exe W0109 15:38:43.697203 64304 Communicator.cpp:259] Adjusting to fit in target footprint: 1468006 bytes W0109 15:38:43.697376 64304 RDMAAggregator.cpp:284] Adjusting to fit in target footprint: 1677721 bytes W0109 15:38:43.697387 64304 Task.cpp:116] Adjusting to fit in target footprint: 1684015 bytes I0109 15:38:43.697545 64304 Grappa.cpp:326] Footprint estimates:
I0109 15:38:43.707406 64304 hello_world.cpp:34] Hello world from locale 0 core 0 I0109 15:38:43.707423 64305 hello_world.cpp:34] Hello world from locale 0 core 1 I0109 15:38:43.707474 64306 hello_world.cpp:34] Hello world from locale 0 core 2 I0109 15:38:43.707484 64307 hello_world.cpp:34] Hello world from locale 0 core 3 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 1 -ppn 4 -f barhosts -envall applications/demos/hello_world.exe W0109 15:46:23.696423 64549 Communicator.cpp:259] Adjusting to fit in target footprint: 5872025 bytes W0109 15:46:23.696615 64549 Task.cpp:116] Adjusting to fit in target footprint: 11291593 bytes I0109 15:46:23.696770 64549 Grappa.cpp:326] Footprint estimates:
I0109 15:46:23.711109 64549 hello_world.cpp:34] Hello world from locale 0 core 0 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 1 -f barhosts -envall applications/demos/hello_world.exe E0109 15:45:29.346119 64499 LocaleSharedMemory.cpp:201] Allocation of 524288 bytes with alignment 8 failed with 523056 free and 33030144 allocated locally E0109 15:45:29.347411 130962 LocaleSharedMemory.cpp:201] Allocation of 524288 bytes with alignment 8 failed with 523056 free and 33030144 allocated locally E0109 15:45:29.347049 38834 LocaleSharedMemory.cpp:201] Allocation of 524288 bytes with alignment 8 failed with 523056 free and 33030144 allocated locally E0109 15:45:29.347663 60003 LocaleSharedMemory.cpp:201] Allocation of 524288 bytes with alignment 8 failed with 523056 free and 33030144 allocated locally * Aborted at 1420847129 (unix time) try "date -d @1420847129" if you are using GNU date * * Aborted at 1420847129 (unix time) try "date -d @1420847129" if you are using GNU date * * Aborted at 1420847129 (unix time) try "date -d @1420847129" if you are using GNU date * * Aborted at 1420847129 (unix time) try "date -d @1420847129" if you are using GNU date * PC: @ 0x7f2325236582 google::DumpStackTrace() PC: @ 0x7f0c2d7fc582 google::DumpStackTrace() PC: @ 0x7fd541598582 google::DumpStackTrace() PC: @ 0x7fd984734582 google::DumpStackTrace() @ 0x432641 Grappa::impl::failure_function() @ 0x432641 Grappa::impl::failure_function() @ 0x432641 Grappa::impl::failure_function() @ 0x432641 Grappa::impl::failure_function() @ 0x43679d Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x43679d Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x42e31d Communicator::activate() @ 0x43679d Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x43679d Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x42e31d Communicator::activate() @ 0x42e31d Communicator::activate() @ 0x4342a1 Grappa_activate() @ 0x42e31d Communicator::activate() @ 0x4342a1 Grappa_activate() @ 0x4342a1 Grappa_activate() @ 0x424fd4 main @ 0x4342a1 Grappa_activate() @ 0x3106821b45 (unknown) @ 0x424fd4 main @ 0x424fd4 main @ 0x3d64421b45 (unknown) @ 0x342b221b45 (unknown) @ 0x424fd4 main @ 0x3e12621b45 (unknown) @ 0x4299d1 (unknown) I0109 15:45:29.352670 130962 Grappa.cpp:251] Exiting via failure function @ 0x4299d1 (unknown) I0109 15:45:29.353027 38834 Grappa.cpp:251] Exiting via failure function @ 0x4299d1 (unknown) I0109 15:45:29.353546 60003 Grappa.cpp:251] Exiting via failure function @ 0x4299d1 (unknown) I0109 15:45:29.352552 64499 Grappa.cpp:251] Exiting via failure function
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:29 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Ah, the tasks == 2x nodes thing is us screwing up the math when --ppn is unspecified.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69419110.
Two things:
First, when you call mpiexec.hydra directly, -n
is the total number of tasks/processes in the job. -ppn
sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.
Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?
sysctl kernel.shmmax
On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result
kernel.shmmax = 12884901888
Sigh, and I’ve only used –ppn 50K times before . Thanks, Jacob.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:54 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Two things:
First, when you call mpiexec.hydra directly, -n is the total number of tasks/processes in the job. -ppn sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422419.
I recommend eating your greens, too!
Hi Jacob,
This is what I get: [rfvander@bar1 Stencil]$ sysctl kernel.shmmax kernel.shmmax = 33554432 So, indeed, much less than on your nodes.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:56 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?
sysctl kernel.shmmax
On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result
kernel.shmmax = 12884901888
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422637.
Preaching to the choir, I’m a vegetarian ☺.
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 4:10 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
I recommend eating your greens, too!
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69424269.
Hi Jacob,
Let’s follow up on two critical issues:
· insufficient shared memory; as I mentioned earlier, my system is configured with way less than you wrote. Unfortunately, I am not allowed to change that, so should probably move to another system. What worries me, though, is that I ran into issues when running hello_world. That should not require 300MB of shared memory.
· Since I need to move to another system where I will be behind a firewall, the Grappa build process will not be able to access git anymore during the build. Of course, you want to be able to build without external dependencies anyway. How difficult is it to change the build procedure to do that? Thanks!
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:56 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Second, it looks like Grappa thinks it has very little shared memory available. This could be a property of your node configuration, or it could be a problem with the build. Could you run this command and let me know the result?
sysctl kernel.shmmax
On many machines it's configured to 0.5*DRAM size, so for our 24GB nodes, I get the result
kernel.shmmax = 12884901888
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422637.
Thanks!
From: Brandon Holt [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 12:33 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
See #199https://github.com/uwsampa/grappa/issues/199.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69814325.
I'm tracking the shared memory issue in #202. In the long term this will go away; in the short term I may be able to work around it.
Don't read too much into the 300MB for hello_world. Currently we've sized a bunch of default allocations for running larger jobs with thousands of threads per node on ~100 nodes. As we work to make our current alpha-qualtity research code more usable one of the things we'll be working on is scaling down to a single node, or even a laptop. That's tracked in #164, and is another one of my projects for the next few months.
As for the subject of this ticket:
When we last talked we had two problems:
It appears that you've made progress on one of both of these? What happened?
I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.
Actually, applying Jeff’s solution of using MPI for both C compilers and MPI compilers did the trick.
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 1:39 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
As for the subject of this ticket:
When we last talked we had two problems:
It appears that you've made progress on one of both of these? What happened?
I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69825305.
Hi Jacob,
Progress! But isn’t this then not the expected behavior?
Rob
[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.712 GB locale shared heap total: 62.856 GB locale shared heap per core: 3.9285 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 0.982124 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 0.982125 GB free per locale: 44.8516 GB
I0120 11:41:15.700206 116400 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 11:41:15.700273 116403 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 11:41:15.700273 116404 hello_world.cpp:34] Hello world from locale 0 core 4 I0120 11:41:15.700361 116401 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 11:41:15.700291 116402 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 11:41:15.700294 116405 hello_world.cpp:34] Hello world from locale 0 core 5 I0120 11:41:15.700399 116406 hello_world.cpp:34] Hello world from locale 0 core 6 I0120 11:41:15.700278 116407 hello_world.cpp:34] Hello world from locale 0 core 7 I0120 11:41:15.700284 116408 hello_world.cpp:34] Hello world from locale 0 core 8 I0120 11:41:15.700284 116409 hello_world.cpp:34] Hello world from locale 0 core 9 I0120 11:41:15.700285 116410 hello_world.cpp:34] Hello world from locale 0 core 10 I0120 11:41:15.700284 116411 hello_world.cpp:34] Hello world from locale 0 core 11 I0120 11:41:15.700284 116412 hello_world.cpp:34] Hello world from locale 0 core 12 I0120 11:41:15.700284 116414 hello_world.cpp:34] Hello world from locale 0 core 14 I0120 11:41:15.700285 116415 hello_world.cpp:34] Hello world from locale 0 core 15 I0120 11:41:15.700798 116413 hello_world.cpp:34] Hello world from locale 0 core 13
From: Jacob Nelson [mailto:notifications@github.com] Sent: Friday, January 09, 2015 3:54 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Two things:
First, when you call mpiexec.hydra directly, -n is the total number of tasks/processes in the job. -ppn sets the number of tasks per node (which may mean "entry in hosts file"; I'm not sure). So if you want 16 processes with 4 processes per node, put four nodes in your host file, set -n to 16, and -ppn to 4.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69422419.
You're saying that because it looks like it's running 16 processes on a single locale, rather than 4 processes each on 4 locales? Yes indeed.
It looks like you didn't specify a hostfile, but you did in a previous command listed here; was that intended?
Hi Jacob,
Yes, sorry for being so terse ☺. I omitted specifying a hostfile intentionally.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 11:56 AM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
You're saying that because it looks like it's running 16 processes on a single locale, rather than 4 processes each on 4 locales? Yes indeed.
It looks like you didn't specify a hostfile, but you did in a previous command listed here; was that intended?
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70721212.
Please continue---still not quite sure I understand what you're asking.
Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?
(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)
Hi Jacob,
My question was indeed that with –nnode=4 I should just see four processes, independent of –ppn. I am running on the scheduler-less non-production cluster.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Please continue---still not quite sure I understand what you're asking.
Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?
(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.
And here are two attempts at multi-node runs. Note that the latter, using mpiexec.hydra, doesn’t produce any output.
Rob
[rfvander@bar1 Make+Release]$ bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile barhosts applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB
I0120 13:00:03.864444 90526 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:00:03.864507 90527 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:00:03.864765 90529 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:00:03.864955 90528 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:00:03.865277 45724 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:00:03.863034 50605 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:00:03.865530 45726 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:00:03.869408 85006 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:00:03.863055 50606 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:00:03.865452 45727 hello_world.cpp:34] Hello world from locale 3 core 15 I0120 13:00:03.869412 85007 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:00:03.863046 50607 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:00:03.865536 45725 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:00:03.869418 85008 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:00:03.863041 50608 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:00:03.869418 85009 hello_world.cpp:34] Hello world from locale 1 core 7 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -hostfile barhosts applications/demos/hello_world.exe [rfvander@bar1 Make+Release]$
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Please continue---still not quite sure I understand what you're asking.
Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?
(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.
Hi Jacob,
While the scheduler issue gets sorted out I want to report an issue I found when trying to use the Intel compiler (C,C++, and MPI). This does not show up with gcc.
Rob
compilation aborted for /lustre/home/rfvander/grappa/system/Grappa.cpp (code 2)
make[2]: * [system/CMakeFiles/Grappa.dir/Grappa.cpp.o] Error 2
/lustre/home/rfvander/grappa/system/DelegateBase.hpp(105): error: expression must have a constant value
static_assert(std::is_convertible< decltype(func()), T >(),
^
detected during instantiation of "auto Grappa::impl::call(Grappa::impl::Core={int16_t={short}}, F)->decltype((
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 12:13 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Please continue---still not quite sure I understand what you're asking.
Which machine is this on, your scheduler-less non-production cluster or the production one with LSF? What behavior are you expecting? If you ran the equivalent command with mpiexec would you expect it to distribute across multiple nodes?
(The grappa_run script is kind of a hack to make it easier for us to run experiments on a couple clusters with different schedulers, but the wide variety of scheduler and job launch configurations out there make it hard to present a consistent level of abstraction. I'm still trying to figure out what we should be providing.)
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70724038.
Okay, I get it now.
Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)
Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe
or use mpiexec.hydra directly like this:
source ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Unfortunately the weird --ppn
behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.
I suggest that you use mpiexec.hydra directly for now.
(and please open another issue for the Intel compiler problem so we don't clutter this one. Thanks!)
Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB
I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB
I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Okay, I get it now.
Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)
Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe
or use mpiexec.hydra directly like this:
source ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.
I suggest that you use mpiexec.hydra directly for now.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70738340.
Hi Jacob,
While the scheduler issue gets sorted out I want to report an issue I found when trying to use the Intel compiler (C,C++, and MPI). This does not show up with gcc.
Rob
compilation aborted for /lustre/home/rfvander/grappa/system/Grappa.cpp (code 2)
make[2]: *\ [system/CMakeFiles/Grappa.dir/Grappa.cpp.o] Error 2
/lustre/home/rfvander/grappa/system/DelegateBase.hpp(105): error: expression must have a constant value
static_assert(std::is_convertible< decltype(func()), T >(),
^
detected during instantiation of "auto Grappa::impl::call(Grappa::impl::Core={int16_t={short}}, F)->decltype((
compilation aborted for /lustre/home/rfvander/grappa/system/graph/TupleGraph.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/graph/TupleGraph.cpp.o] Error 2 compilation aborted for /lustre/home/rfvander/grappa/system/GlobalHashMap.cpp (code 2) make[2]: * [system/CMakeFiles/Grappa.dir/GlobalHashMap.cpp.o] Error 2 make[1]: * [system/CMakeFiles/Grappa.dir/all] Error 2 make: * [all] Error 2
Great!
As a matter of curiosity, what does the output look like if you run that without the -ppn argument? i.e., mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_ world.exe
On Tue, Jan 20, 2015 at 2:01 PM, rfvander notifications@github.com wrote:
Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
I0120 13:58:43.466963 92113 Grappa.cpp:587]
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB
free per core: 11.6072 GB
I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
I0120 13:59:47.983866 92180 Grappa.cpp:587]
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB
free per core: 11.6541 GB
I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Okay, I get it now.
Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)
Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe
or use mpiexec.hydra directly like this:
source ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.
I suggest that you use mpiexec.hydra directly for now.
— Reply to this email directly or view it on GitHub< https://github.com/uwsampa/grappa/issues/198#issuecomment-70738340>.
— Reply to this email directly or view it on GitHub https://github.com/uwsampa/grappa/issues/198#issuecomment-70743090.
Then we have a bit of a problem, but –ppn 1 fixes that.
[rfvander@bar1 Make+Release]$ mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 3.93241 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 0.983101 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 0.983102 GB free per locale: 44.8993 GB
I0120 14:14:24.598052 92567 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:14:24.598104 92578 hello_world.cpp:34] Hello world from locale 0 core 11 I0120 14:14:24.598104 92581 hello_world.cpp:34] Hello world from locale 0 core 14 I0120 14:14:24.598232 92575 hello_world.cpp:34] Hello world from locale 0 core 8 I0120 14:14:24.598458 92580 hello_world.cpp:34] Hello world from locale 0 core 13 I0120 14:14:24.598603 92571 hello_world.cpp:34] Hello world from locale 0 core 4 I0120 14:14:24.598901 92577 hello_world.cpp:34] Hello world from locale 0 core 10 I0120 14:14:24.599145 92573 hello_world.cpp:34] Hello world from locale 0 core 6 I0120 14:14:24.599889 92579 hello_world.cpp:34] Hello world from locale 0 core 12 I0120 14:14:24.600147 92582 hello_world.cpp:34] Hello world from locale 0 core 15 I0120 14:14:24.600280 92570 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 14:14:24.600425 92572 hello_world.cpp:34] Hello world from locale 0 core 5 I0120 14:14:24.600585 92576 hello_world.cpp:34] Hello world from locale 0 core 9 I0120 14:14:24.600649 92568 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 14:14:24.600754 92574 hello_world.cpp:34] Hello world from locale 0 core 7 I0120 14:14:24.600850 92569 hello_world.cpp:34] Hello world from locale 0 core 2 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB
I0120 14:14:36.299289 92621 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:14:36.299342 92622 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 14:14:36.299350 92623 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 14:14:36.299350 92624 hello_world.cpp:34] Hello world from locale 0 core 3 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 1 -f hostfile.txt -envall applications/demos/hello_world.exe
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 62.9185 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 15.7296 GB aggregator per core: 0.190094 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 15.7296 GB free per locale: 46.8581 GB
I0120 14:16:05.254698 92676 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 14:16:05.253454 47751 hello_world.cpp:34] Hello world from locale 3 core 3 I0120 14:16:05.258024 86928 hello_world.cpp:34] Hello world from locale 1 core 1 I0120 14:16:05.254014 52561 hello_world.cpp:34] Hello world from locale 2 core 2
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 2:14 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Great!
As a matter of curiosity, what does the output look like if you run that without the -ppn argument? i.e., mpiexec.hydra -n 16 -f hostfile.txt -envall applications/demos/hello_ world.exe
On Tue, Jan 20, 2015 at 2:01 PM, rfvander notifications@github.com wrote:
Your suggestion to use mpiexec.hydra directly works, Jacob. I’ll use that from now on.
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
I0120 13:58:43.466963 92113 Grappa.cpp:587]
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.4287 GB
free per core: 11.6072 GB
I0120 13:58:43.492902 92113 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:58:43.492965 92116 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:58:43.493103 92114 hello_world.cpp:34] Hello world from locale 0 core 1 I0120 13:58:43.493202 92115 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:58:43.494909 86502 hello_world.cpp:34] Hello world from locale 1 core 4 I0120 13:58:43.494940 86503 hello_world.cpp:34] Hello world from locale 1 core 5 I0120 13:58:43.494925 86504 hello_world.cpp:34] Hello world from locale 1 core 6 I0120 13:58:43.494918 86505 hello_world.cpp:34] Hello world from locale 1 core 7 I0120 13:58:43.490975 52095 hello_world.cpp:34] Hello world from locale 2 core 8 I0120 13:58:43.491000 52096 hello_world.cpp:34] Hello world from locale 2 core 9 I0120 13:58:43.490989 52097 hello_world.cpp:34] Hello world from locale 2 core 10 I0120 13:58:43.490996 52098 hello_world.cpp:34] Hello world from locale 2 core 11 I0120 13:58:43.491773 47312 hello_world.cpp:34] Hello world from locale 3 core 12 I0120 13:58:43.491777 47313 hello_world.cpp:34] Hello world from locale 3 core 13 I0120 13:58:43.491685 47314 hello_world.cpp:34] Hello world from locale 3 core 14 I0120 13:58:43.491685 47315 hello_world.cpp:34] Hello world from locale 3 core 15 [rfvander@bar1 Make+Release]$ mpiexec.hydra -n 4 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
I0120 13:59:47.983866 92180 Grappa.cpp:587]
Shared memory breakdown: node total: 125.837 GB locale shared heap total: 62.9185 GB locale shared heap per core: 15.7296 GB communicator per core: 0.125 GB tasks per core: 0.0156631 GB global heap per core: 3.9324 GB aggregator per core: 0.00247955 GB shared_pool current per core: 4.76837e-07 GB shared_pool max per core: 3.93241 GB free per locale: 46.6164 GB
free per core: 11.6541 GB
I0120 13:59:48.008868 92180 hello_world.cpp:34] Hello world from locale 0 core 0 I0120 13:59:48.008970 92182 hello_world.cpp:34] Hello world from locale 0 core 2 I0120 13:59:48.008921 92183 hello_world.cpp:34] Hello world from locale 0 core 3 I0120 13:59:48.009037 92181 hello_world.cpp:34] Hello world from locale 0 core 1
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 20, 2015 1:34 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Okay, I get it now.
Right now Grappa depends on some environment variables being set to generate output (and do other things). When they're not, it logs to a file in /tmp. So the grappa_run script exists basically to make sure some environment variables are set on the client nodes before the job's processes start. (We may be able to get rid of this but haven't yet.)
Remember a week or two ago I told you that you can either use a hostfile with grappa_run like this:
bin/grappa_run --mpi=true --nnode=4 --ppn=4 -- --hostfile hostfile.txt applications/demos/hello_world.exe
or use mpiexec.hydra directly like this:
source ../../util/env.sh
mpiexec.hydra -n 16 -ppn 4 -f hostfile.txt -envall applications/demos/hello_world.exe
Unfortunately the weird --ppn behavior with grappa_run on your schedulerless cluster is hard to avoid---if you ask for 16 tasks with 4 per node, but only give it one node (perhaps by not providing a hostfile or a scheduler to get nodes from), it will oversubscribe that node. I'm going to have to look into what is possible with mpiexec.hydra---it may not be possible to restrict the job in the way that we want.
I suggest that you use mpiexec.hydra directly for now.
— Reply to this email directly or view it on GitHub< https://github.com/uwsampa/grappa/issues/198#issuecomment-70738340>.
— Reply to this email directly or view it on GitHub https://github.com/uwsampa/grappa/issues/198#issuecomment-70743090.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-70745203.
Hi Jacob,
I am now trying to build your implementation of synch_p2p using the uts example in the grappa repo as an example. However, uts as described in README-Grappa.md does not build.
There is no file called Makefile in the uts directory
When I ask to use Makefile.uts, which is present, the grappa target cannot be found
Then I tried another example, sort, which doesn’t have a Makefile. Then I looked at isopath, which has a grappa subdirectory with a Makefile. Typing make there produced the following:
[rfvander@bar1 grappa]$ make
Makefile:9: //include.mk: No such file or directory
Makefile:41: //system/Makefile: No such file or directory
Makefile:78: warning: overriding recipe for target `run'
Makefile:75: warning: ignoring old recipe for target `run'
make: *\ No rule to make target `//system/Makefile'. Stop.
Perhaps it is time for a little primer how to build a grappa application? Thanks.
Rob
Hello Jacob,
While the build problem is now resolved on my research cluster, I am having continued problems with building on my production cluster. It does not have access to the Internet, so I downloaded the third party packages and built using –no-download. I also specify all the compilers in the same way as on my research cluster, but I keep getting error messages. As you can see (I added the definition of environment variables that I set before building), Cmake cannot find MPI_CXX or MPI_CXX_LIBRARIES, even though these variables are explicitly defined. Could you give me an idea how to work around this problem? Ultimately, I want to compare timings, and I won’t be able to do that on our research cluster. Thanks. BTW, I am little puzzled by the build output that Boost could not be found and that it is downloading that. Probably it’s innocuous, but you may want to change that warning.
Rob
[rfvander@eln4 grappa]$ \rm -rf build/ [rfvander@eln4 grappa]$ ./configure --no-downloads cmake /panfs/panfs3/users3/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=91239737344 -DNO_DOWNLOADS=true -DCMAKE_C_COMPILER=mpigcc -DCMAKE_CXX_COMPILER=mpigxx -DBASE_C_COMPILER=mpigcc -DBASE_CXX_COMPILER=mpigxx -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBOOST_ROOT=/sampa/share/gcc-4.7.2/src/boost_1_51_0 -- The C compiler identification is GNU 4.4.7 -- The CXX compiler identification is GNU 4.4.7 -- Check for working C compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc -- Check for working C compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigxx -- Check for working CXX compiler: /opt/intel/impi/5.0.1.035/intel64/bin/mpigxx -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Boost not found. !! Will download and build Boost, which may take a while. -- Found MPI_C: /opt/intel/impi/5.0.1.035/intel64/bin/mpigcc CMake Error at /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindPackageHandleStandardArgs.cmake:136 (message): Could NOT find MPI_CXX (missing: MPI_CXX_LIBRARIES) Call Stack (most recent call first): /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindPackageHandleStandardArgs.cmake:343 (_FPHSA_FAILURE_MESSAGE) /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/FindMPI.cmake:611 (find_package_handle_standard_args) CMakeLists.txt:205 (find_package)
-- Configuring incomplete, errors occurred! See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log". [rfvander@eln4 grappa]$ history | grep export | tail -10 879 export MPI_C_COMPILER=mpigcc 952 export CC=mpigcc 953 export CXX=mpigxx 954 export MPI_C_COMPILER=mpigcc 955 export MPI_CXX_COMPILER=mpigxx 993 export CC=mpigcc; export CXX=mpigxx; export MPI_C_COMPILER=mpigcc; export MPI_CXX_COMPILER=mpigxx 1002 h | grep export 1020 export MPI_CXX_LIBRARIES=/opt/intel/impi/5.0.1.035/intel64/lib 1052 export MPI_CXX=mpigxx 1055 history | grep export | tail -10
From: Jacob Nelson [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 1:39 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
As for the subject of this ticket:
When we last talked we had two problems:
It appears that you've made progress on one of both of these? What happened?
I would like to figure out more of what was going wrong with MPI discovery so I can file a bug with the CMake folks.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-69825305.
This issue is still open for me, unfortunately. The only grappa codes I have been able to build are integrated in your package, and as such are not a model for what an application developer would do. Could you send me a simple example: a tar with just an example makefile and a source code? Thanks.
Rob
From: Van Der Wijngaart, Rob F Sent: Wednesday, January 21, 2015 11:36 AM To: 'uwsampa/grappa' Subject: Building my own grappa application
Hi Jacob,
I am now trying to build your implementation of synch_p2p using the uts example in the grappa repo as an example. However, uts as described in README-Grappa.md does not build.
There is no file called Makefile in the uts directory
When I ask to use Makefile.uts, which is present, the grappa target cannot be found
Then I tried another example, sort, which doesn’t have a Makefile. Then I looked at isopath, which has a grappa subdirectory with a Makefile. Typing make there produced the following:
[rfvander@bar1 grappa]$ make
Makefile:9: //include.mk: No such file or directory
Makefile:41: //system/Makefile: No such file or directory
Makefile:78: warning: overriding recipe for target `run'
Makefile:75: warning: ignoring old recipe for target `run'
make: *\ No rule to make target `//system/Makefile'. Stop.
Perhaps it is time for a little primer how to build a grappa application? Thanks.
Rob
Hi Rob,
We're taking a moment to remove some complexity from our build system before updating the docs with details on adding new code. I'll get back to you shortly.
Great, thanks, Jacob. I hope you’re not getting frustrated with all my questions, and hope that the result of all of this will be that Grappa will be easier to use for everybody.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Thursday, January 22, 2015 2:05 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Hi Rob,
We're taking a moment to remove some complexity from our build system before updating the docs with details on adding new code. I'll get back to you shortly.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71108639.
Not at all! It's immensely helpful. I just hope I can make progress fast enough to keep you interested while not neglecting my other responsibilities. :-)
Can we schedule some screen-sharing time to debug the MPI problem?
Absolutely! I’ll send an invite if you give me an indication of your availability. Thanks, Jacob.
From: Jacob Nelson [mailto:notifications@github.com] Sent: Thursday, January 22, 2015 2:53 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Not at all! It's immensely helpful. I just hope I can make progress fast enough to keep you interested while not neglecting my other responsibilities. :-)
Can we schedule some screen-sharing time to debug the MPI problem?
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71116002.
After further debugging, we've determined that this MPI detection error is due to a bug in the Intel mpicc wrapper script---in versions prior to 5.0.2 it doesn't propagate errors from the underlying compiler, which confuses CMake's MPI detection script.
I see three ways to solve this now: 1) Use a newer version of Intel MPI; version 5.0.2 should work fine. (I was using 5.0.2.044.)
2) The CMake folks have also recently added code to solve this problem, which should be available in CMake verison 3.2; here's the bug report: https://public.kitware.com/Bug/view.php?id=15182 That version of CMake is still in development; you could potentially try downloading and building from their trunk, but that would be a pain.
3) Since we want to use GCC with Intel MPI, it ought to work to point CMake at the GCC wrapper scripts directly like this:
CC=gcc CXX=g++ ; ./configure -- -DMPI_C_COMPILER=mpigcc -DMPI_CXX_COMPILER=mpigxx
This works for me when I hack my mpicc script and works for one of the users in the CMake bug report, but it could behave differently on your system if something else is also going on. Note that gcc/g++ here must be at least version 4.7.2.
Thanks, Jacob. I could confirm that the proper error propagation does work for MPI version 5.0.2, and not for the version I was using earlier. The difference is in the mpigcc scripts, not mpicc. So I am pointing to the newer MPI now. I’d like to note, though, that we ultimately want to link with the Intel compilers, not GNU.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 12:48 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
After further debugging, we've determined that this MPI detection error is due to a bug in the Intel mpicc wrapper script---in versions prior to 5.0.2 it doesn't propagate errors from the underlying compiler, which confuses CMake's MPI detection script.
I see three ways to solve this now: 1) Use a newer version of Intel MPI; version 5.0.2 should work fine. (I was using 5.0.2.044.)
2) The CMake folks have also recently added code to solve this problem, which should be available in CMake verison 3.2; here's the bug report: https://public.kitware.com/Bug/view.php?id=15182 That version of CMake is still in development; you could potentially try downloading and building from their trunk, but that would be a pain.
3) Since we want to use GCC with Intel MPI, it ought to work to point CMake at the GCC wrapper scripts directly like this:
CC=gcc CXX=g++ ; ./configure -- -DMPI_C_COMPILER=mpigcc -DMPI_CXX_COMPILER=mpigxx
This works for me when I hack my mpicc script and works for one of the users in the CMake bug report, but it could behave differently on your system if something else is also going on. Note that gcc/g++ here must be at least version 4.7.2.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71535046.
Great! When you're able to build a run a binary we can close the ticket.
As for using the Intel compiler, I'll track that in #205.
Sadly, while configure now breezed through, the build failed. Here is the end of the build output.
Rob
common.copy /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/third-party/lib/libboost_prg_exec_monitor.a gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/exception_safety.o gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/interaction_based.o gcc.compile.c++ bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/logged_expectations.o gcc.archive bin.v2/libs/test/build/gcc-4.4.7/release/link-static/threading-multi/libboost_unit_test_framework.a common.copy /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/third-party/lib/libboost_unit_test_framework.a ...updated 10706 targets... [ 30%] No install step for 'third-party-boost' [ 30%] Completed 'third-party-boost' [ 30%] Built target third-party-boost Scanning dependencies of target all-third-party [ 30%] Built target all-third-party Scanning dependencies of target graph500-generator Scanning dependencies of target Communicator [ 30%] Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/graph_generator.c.o [ 35%] [ 35%] [ 35%] Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/make_graph.c.o Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/splittable_mrg.c.o Building C object third-party/graph500-generator/CMakeFiles/graph500-generator.dir/utils.c.o [ 35%] Building CXX object system/CMakeFiles/Communicator.dir/Communicator.cpp.o [ 38%] Building CXX object system/CMakeFiles/Communicator.dir/LocaleSharedMemory.cpp.o cc1plus: error: unrecognized command line option "-std=c++11" make[2]: * [system/CMakeFiles/Communicator.dir/Communicator.cpp.o] Error 1 make[2]: * Waiting for unfinished jobs.... cc1plus: error: unrecognized command line option "-std=c++11" make[2]: * [system/CMakeFiles/Communicator.dir/LocaleSharedMemory.cpp.o] Error 1 make[1]: * [system/CMakeFiles/Communicator.dir/all] Error 2 make[1]: * Waiting for unfinished jobs.... Linking C static library libgraph500-generator.a [ 38%] Built target graph500-generator make: * [all] Error 2 [rfvander@eln4 Make+Release]$
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 1:51 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Great! When you're able to build a run a binary we can close the ticket.
As for using the Intel compiler, I'll track that in #205https://github.com/uwsampa/grappa/issues/205.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71545776.
Would you verify that your GCC version is >= 4.7.2 with gcc --version
?
It isn’t, just checked, so I’ll move to 4.9, which is available in the corner of my system.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 2:03 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Would you verify that your GCC version is >= 4.7.2 with gcc --version?
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71548067.
Sigh. This is what happens when I upgrade to gcc v 4.9 and use the latest MPI compiler:
[rfvander@eln4 grappa]$ ./configure --no-downloads cmake /panfs/panfs3/users3/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=91239737344 -DNO_DOWNLOADS=true -DCMAKE_C_COMPILER=mpigcc -DCMAKE_CXX_COMPILER=mpigxx -DBASE_C_COMPILER=mpigcc -DBASE_CXX_COMPILER=mpigxx -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBOOST_ROOT=/sampa/share/gcc-4.7.2/src/boost_1_51_0 -- The C compiler identification is unknown -- The CXX compiler identification is unknown -- Check for working C compiler: /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -- Check for working C compiler: /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -- broken CMake Error at /opt/crtdc/cmake/3.0.2/share/cmake-3.0/Modules/CMakeTestCCompiler.cmake:61 (message): The C compiler "/opt/intel/impi/5.0.2.044/intel64/bin/mpigcc" is not able to compile a simple test program. It fails with the following output: Change Dir: /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp
Run Build Command:"/usr/bin/gmake" "cmTryCompileExec2839858443/fast" /usr/bin/gmake -f CMakeFiles/cmTryCompileExec2839858443.dir/build.make CMakeFiles/cmTryCompileExec2839858443.dir/build gmake[1]: Entering directory `/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp'
/opt/crtdc/cmake/3.0.2/bin/cmake -E cmake_progress_report /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building C object CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o /opt/intel/impi/5.0.2.044/intel64/bin/mpigcc -o CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o -c /panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp/testCCompiler.c
/opt/crtdc/gcc/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/cc1: error while loading shared libraries: libmpc.so.3: cannot open shared object file: No such file or directory gmake[1]: *\ [CMakeFiles/cmTryCompileExec2839858443.dir/testCCompiler.c.o] Error 1 gmake[1]: Leaving directory `/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeTmp'
gmake: *\ [cmTryCompileExec2839858443/fast] Error 2
CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:19 (project)
-- Configuring incomplete, errors occurred! See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log". See also "/panfs/panfs3/users3/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeError.log".
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 2:03 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Would you verify that your GCC version is >= 4.7.2 with gcc --version?
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71548067.
Would you verify that you can build a simple plain C program with GCC 4.9, and a MPI program with mpigcc and GCC 4.9? The library it's complaining about is part of GCC, so if mpigcc can't find GCC's library include paths we would expect this sort of error.
Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.
Right, that’s the problem. I’ve poked around, but nothing compiles with this version of gcc on our system. I’m asking the admins to install a new version, or patch up the one we have.
Rob
[rfvander@eln4 Transpose]$ more test.c int main(int argc, char**argv){ int i; } [rfvander@eln4 Transpose]$ gcc test.c /opt/crtdc/gcc/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/cc1: error while loading shared libraries: libmpc.so.3: cannot open shared object file: No such file or directory
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 3:20 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Would you verify that you can build a simple plain C program with GCC 4.9, and a MPI program with mpigcc and GCC 4.9? The library it's complaining about is part of GCC, so if mpigcc can't find GCC's library include paths we would expect this sort of error.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71559690.
Will do, thanks.
Rob
From: Jacob Nelson [mailto:notifications@github.com] Sent: Monday, January 26, 2015 11:36 PM To: uwsampa/grappa Cc: Van Der Wijngaart, Rob F Subject: Re: [grappa] Build problem with MPI library (#198)
Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71601727.
OK, Jacob, progress on my production cluster. It turns out that not all compiler dependences were set. I won’t bore you with the details, but suffice it to say that after correcting that, and after pulling the new bits, I could configure and make grappa (of course, I also needed to do the no-downloads hack). But I could not build hello_world. Error log attached.
[rfvander@eln4 Make+Release]$ make demo-hello_world 2> error.log
[ 10%] Built target third-party-gflags
[ 20%] Built target third-party-boost
[ 30%] Built target third-party-glog
[ 30%] Built target all-third-party
[ 35%] Built target graph500-generator
[ 97%] Built target Grappa
Linking CXX executable helloworld.exe
[rfvander@eln4 Make+Release]$ wc error.log
1656 10181 222660 error.log
[rfvander@eln4 Make+Release]$ grep -10 world.exe error.log
/panfs/panfs3/users3/rfvander/grappa/system/tasks/TaskingScheduler.hpp:471: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int)' ../../system/libGrappa.a(GlobalMemory.cpp.o): In function
Linear':
/panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:172: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int, google::CheckOpString const&)' /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:172: undefined reference to
google::LogMessageFatal::~LogMessageFatal()'
/panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:173: undefined reference to google::LogMessageFatal::LogMessageFatal(char const*, int, google::CheckOpString const&)' /panfs/panfs3/users3/rfvander/grappa/system/Addressing.hpp:173: undefined reference to
google::LogMessageFatal::~LogMessageFatal()'
../../system/libGrappa.a(GlobalMemory.cpp.o): In function Allocator': /panfs/panfs3/users3/rfvander/grappa/system/Allocator.hpp:172: undefined reference to
google::LogMessageFatal::LogMessageFatal(char const, int)'
/panfs/panfs3/users3/rfvander/grappa/system/Allocator.hpp:172: undefined reference to `google::LogMessageFatal::~LogMessageFatal()'
collect2: error: ld returned 1 exit status
make[3]: _\ [applications/demos/hello_world.exe] Error 1
make[2]: * [applications/demos/CMakeFiles/demo-hello_world.dir/all] Error 2
make[1]: * [applications/demos/CMakeFiles/demo-hello_world.dir/rule] Error 2
make: *\ [demo-hello_world] Error 2
From: Jacob Nelson [mailto:notifications@github.com]
Sent: Monday, January 26, 2015 11:36 PM
To: uwsampa/grappa
Cc: Van Der Wijngaart, Rob F
Subject: Re: [grappa] Build problem with MPI library (#198)
Oh, and it looks like this git clone doesn't have the SHMMAX fix---you should do a pull to get the latest bits.
— Reply to this email directly or view it on GitHubhttps://github.com/uwsampa/grappa/issues/198#issuecomment-71601727.
I downloaded Grappa and am now trying to build it, but instructions are a bit sparse. If I define symbols CC and CXX to resolve to the Intel compilers icc and icpc, respectively, I get the error message below. Obviously, my installed MPI cannot be found. I tried to fix that by setting: “export MPICC=mpiicc” but that did not work, nor did “export MPI_C=mpiicc”. There is no reference to MPI in “configure” or in “FindPackageHandleStandardArgs.cmake “. Do you have any suggestions? By the way, I also have GASNet installed, so if that is the better communication layer, I'll use that--if I can get some instructions how to do that. Thanks.
Rob
[rfvander@bar1 grappa]$ export CC=icc [rfvander@bar1 grappa]$ export CXX=icpc [rfvander@bar1 grappa]$ ./configure --gen=Make --mode=Release cmake /lustre/home/rfvander/grappa -G"Unix Makefiles" -DSHMMAX=33554432 -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DBASE_C_COMPILER=icc -DBASE_CXX_COMPILER=icpc -DCMAKE_BUILD_TYPE=RelWithDebInfo -- The C compiler identification is Intel 15.0.0.20140723 -- The CXX compiler identification is Intel 15.0.0.20140723 -- Check for working C compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icc -- Check for working C compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icpc -- Check for working CXX compiler: /opt/intel/tools/composer_xe_2015.0.090/bin/intel64/icpc -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Boost found: 1.53.0 -- /usr CMake Error at /usr/share/cmake/Modules/ FindPackageHandleStandardArgs.cmake:108 (message): Could NOT find MPI_C (missing: MPI_C_LIBRARIES) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindMPI.cmake:587 (find_package_handle_standard_args) CMakeLists.txt:205 (find_package)
-- Configuring incomplete, errors occurred! See also "/lustre/home/rfvander/grappa/build/Make+Release/CMakeFiles/CMakeOutput.log".