gutmann / coarray_icar

Testing implementation of CoArrays for the basic ICAR algorithms
MIT License
5 stars 6 forks source link

Runtime issues with ifort and gfortran/OpenCoarrays. #38

Open rouson opened 5 years ago

rouson commented 5 years ago

@gutmann @scrasmussen

Could you remind me whether coarray_icar works at all with gfortran versions > 6.x? When I build the develop branch of this fork as follows with gfortran 8.2.0 and a recent OpenCoarrays commit, I get the following:

$ cd src/tests
$ export COMPILER=gnu
$ make USE_ASSERTIONS=.true.
$ cafrun -n 4 ./test-ideal
 Number of images =            4
           1 domain%initialize_from_file('input-parameters.txt')
 ximgs=           2 yimgs=           2
 call master_initialize(this)
 call this%variable%initialize(this%get_grid_dimensions(),variable_test_val)
  Layer height       Pressure        Temperature      Water Vapor
      [m]              [hPa]             [K]            [kg/kg]
   9750.00000       271.047180       206.509430       9.17085254E-06
   7750.00000       364.236786       224.725372       7.91714992E-05
   5750.00000       481.825287       243.449936       5.01311326E-04
   3750.00000       628.424316       262.669800       2.46796501E-03
   1750.00000       809.217651       282.372711       9.08217765E-03
 ThompMP: read qr_acr_qg.dat instead of computing
 qr_acr_qg initialized:  0.229000002            
 ThompMP: read qr_acr_qs.dat instead of computing
 qr_acr_qs initialized:  0.170000002            
 ThompMP: read freezeH2O.dat instead of computing
 freezeH2O initialized:   1.02300000            
 qi_aut_qs initialized:   1.79999992E-02        

 Beginning simulation...
 Assertion "put_north: conformable halo_south_in and local " failed on image            1
ERROR STOP 
 Assertion "put_south: conformable halo_north_in and local " failed on image            4
ERROR STOP 
 Assertion "put_south: conformable halo_north_in and local " failed on image            3
ERROR STOP 
 Assertion "put_north: conformable halo_south_in and local " failed on image            2
ERROR STOP 

[proxy:0:0@Sourcery-Institute-VM] HYDU_sock_write (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/utils/sock/sock.c:294): write error (Broken pipe)
[proxy:0:0@Sourcery-Institute-VM] HYD_pmcd_pmip_control_cmd_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:932): unable to write to downstream stdin
[proxy:0:0@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip.c:202): demux engine error waiting for event
[mpiexec@Sourcery-Institute-VM] control_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:208): assert (!closed) failed
[mpiexec@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@Sourcery-Institute-VM] HYD_pmci_wait_for_completion (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/ui/mpich/mpiexec.c:340): process manager error waiting for completion
Error: Command:
   `/opt/mpich/3.2.1/gnu/8.2.0/bin/mpiexec -n 4 --disable-auto-cleanup ./test-ideal`
failed to run.

I've also attempted to build with the Intel 18 and 19 compilers on the pegasus.nic.uoregon.edu and got the following runtime messages after which execution hangs:

$ mpiexec -np 1 ./test-ideal
[mpiexec@pegasus] HYDU_parse_hostfile (../../utils/args/args.c:553): unable to open host file: ./cafconfig.txt
[mpiexec@pegasus] config_tune_fn (../../ui/mpich/utils.c:2192): error parsing config file
[mpiexec@pegasus] match_arg (../../utils/args/args.c:243): match handler returned error
[mpiexec@pegasus] HYDU_parse_array_single (../../utils/args/args.c:294): argument matching returned error
[mpiexec@pegasus] HYD_uii_mpx_get_parameters (../../ui/mpich/utils.c:4999): error parsing input array

Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...

Global options (passed to all executables):

  Global environment options:
    -genv {name} {value}             environment variable name and value
    -genvlist {env1,env2,...}        environment variable list to pass
    -genvnone                        do not pass any environment variables
    -genvall                         pass all environment variables not managed
                                          by the launcher (default)

  Other global options:
    -f {name} | -hostfile {name}     file containing the host names
    -hosts {host list}               comma separated host list
    -configfile {name}               config file containing MPMD launch options
    -machine {name} | -machinefile {name}
                                     file mapping procs to machines
    -pmi-connect {nocache|lazy-cache|cache}
                                     set the PMI connections mode to use
    -pmi-aggregate                   aggregate PMI messages
    -pmi-noaggregate                 do not  aggregate PMI messages
    -trace {<libraryname>}           trace the application using <libraryname>
                                     profiling library; default is libVT.so
    -trace-imbalance {<libraryname>} trace the application using <libraryname>
                                     imbalance profiling library; default is libVTim.so
    -check-mpi {<libraryname>}       check the application using <libraryname>
                                     checking library; default is libVTmc.so
    -ilp64                           Preload ilp64 wrapper library for support default size of
                                     integer 8 bytes
    -mps                             start statistics gathering for MPI Performance Snapshot (MPS)
    -aps                             start statistics gathering for Application Performance Snapshot (APS)
    -trace-pt2pt                     collect information about
                                     Point to Point operations
    -trace-collectives               collect information about
                                     Collective operations
    -tune [<confname>]               apply the tuned data produced by
                                     the MPI Tuner utility
    -use-app-topology <statfile>     perform optimized rank placement based statistics
                                     and cluster topology
    -noconf                          do not use any mpiexec's configuration files
    -branch-count {leaves_num}       set the number of children in tree
    -gwdir {dirname}                 working directory to use
    -gpath {dirname}                 path to executable to use
    -gumask {umask}                  mask to perform umask
    -tmpdir {tmpdir}                 temporary directory for cleanup input file
    -cleanup                         create input file for clean up
    -gtool {options}                 apply a tool over the mpi application
    -gtoolfile {file}                apply a tool over the mpi application. Parameters specified in the file

Local options (passed to individual executables):

  Local environment options:
    -env {name} {value}              environment variable name and value
    -envlist {env1,env2,...}         environment variable list to pass
    -envnone                         do not pass any environment variables
    -envall                          pass all environment variables (default)

  Other local options:
    -host {hostname}                 host on which processes are to be run
    -hostos {OS name}                operating system on particular host
    -wdir {dirname}                  working directory to use
    -path {dirname}                  path to executable to use
    -umask {umask}                   mask to perform umask
    -n/-np {value}                   number of processes
    {exec_name} {args}               executable name and arguments

Hydra specific options (treated as global):

  Bootstrap options:
    -bootstrap                       bootstrap server to use
     (ssh rsh pdsh fork slurm srun ll llspawn.stdio lsf blaunch sge qrsh persist service pbsdsh)
    -bootstrap-exec                  executable to use to bootstrap processes
    -bootstrap-exec-args             additional options to pass to bootstrap server
    -prefork                         use pre-fork processes startup method
    -enable-x/-disable-x             enable or disable X forwarding

  Resource management kernel options:
    -rmk                             resource management kernel to use (user slurm srun ll llspawn.stdio lsf blaunch sge qrsh pbs cobalt)

  Processor topology options:
    -binding                         process-to-core binding mode
  Extended fabric control options:
    -rdma                            select RDMA-capable network fabric (dapl). Fallback list is ofa,tcp,tmi,ofi
    -RDMA                            select RDMA-capable network fabric (dapl). Fallback is ofa
    -dapl                            select DAPL-capable network fabric. Fallback list is tcp,tmi,ofa,ofi
    -DAPL                            select DAPL-capable network fabric. No fallback fabric is used
    -ib                              select OFA-capable network fabric. Fallback list is dapl,tcp,tmi,ofi
    -IB                              select OFA-capable network fabric. No fallback fabric is used
    -tmi                             select TMI-capable network fabric. Fallback list is dapl,tcp,ofa,ofi
    -TMI                             select TMI-capable network fabric. No fallback fabric is used
    -mx                              select Myrinet MX* network fabric. Fallback list is dapl,tcp,ofa,ofi
    -MX                              select Myrinet MX* network fabric. No fallback fabric is used
    -psm                             select PSM-capable network fabric. Fallback list is dapl,tcp,ofa,ofi
    -PSM                             select PSM-capable network fabric. No fallback fabric is used
    -psm2                            select Intel* Omni-Path Fabric. Fallback list is dapl,tcp,ofa,ofi
    -PSM2                            select Intel* Omni-Path Fabric. No fallback fabric is used
    -ofi                             select OFI-capable network fabric. Fallback list is tmi,dapl,tcp,ofa
    -OFI                             select OFI-capable network fabric. No fallback fabric is used

  Checkpoint/Restart options:
    -ckpoint {on|off}                enable/disable checkpoints for this run
    -ckpoint-interval                checkpoint interval
    -ckpoint-prefix                  destination for checkpoint files (stable storage, typically a cluster-wide file system)
    -ckpoint-tmp-prefix              temporary/fast/local storage to speed up checkpoints
    -ckpoint-preserve                number of checkpoints to keep (default: 1, i.e. keep only last checkpoint)
    -ckpointlib                      checkpointing library (blcr)
    -ckpoint-logfile                 checkpoint activity/status log file (appended)
    -restart                         restart previously checkpointed application
    -ckpoint-num                     checkpoint number to restart

  Demux engine options:
    -demux                           demux engine (poll select)

  Debugger support options:
    -tv                              run processes under TotalView
    -tva {pid}                       attach existing mpiexec process to TotalView
    -gdb                             run processes under GDB
    -gdba {pid}                      attach existing mpiexec process to GDB
    -gdb-ia                          run processes under Intel IA specific GDB

  Other Hydra options:
    -v | -verbose                    verbose mode
    -V | -version                    show the version
    -info                            build information
    -print-rank-map                  print rank mapping
    -print-all-exitcodes             print exit codes of all processes
    -iface                           network interface to use
    -help                            show this message
    -perhost <n>                     place consecutive <n> processes on each host
    -ppn <n>                         stand for "process per node"; an alias to -perhost <n>
    -grr <n>                         stand for "group round robin"; an alias to -perhost <n>
    -rr                              involve "round robin" startup scheme
    -s <spec>                        redirect stdin to all or 1,2 or 2-4,6 MPI processes (0 by default)
    -ordered-output                  avoid data output intermingling
    -profile                         turn on internal profiling
    -l | -prepend-rank               prepend rank to output
    -prepend-pattern                 prepend pattern to output
    -outfile-pattern                 direct stdout to file
    -errfile-pattern                 direct stderr to file
    -localhost                       local hostname for the launching node
    -nolocal                         avoid running the application processes on the node where mpiexec.hydra started

Intel(R) MPI Library for Linux* OS, Version 2018 Update 3 Build 20180411 (id: 18329)
Copyright 2003-2018 Intel Corporation.
^C[mpiexec@pegasus] Sending Ctrl-C to processes as requested
[mpiexec@pegasus] Press Ctrl-C again to force abort
gutmann commented 5 years ago

That looks like the problem we were seeing with opencoarrays not knowing the shape of a multi-dimensional coarray correctly. I thought that was fixed, but maybe it was just identified. I don’t think I have ever gotten it to run with gfortran >6.x.

ifort is looking for a configuration file, you can find instructions on what that file should look like online. I think I got some instructions from Larry on an alternate way to compile using coarray=single but then launching with mpiexec that bypasses the config file. I’ll get that info for you tomorrow.

Ethan

On Oct 29, 2018, at 8:14 PM, Damian Rouson notifications@github.com wrote:

@gutmann @scrasmussen

Could you remind me whether coarray_icar works at all with gfortran versions > 6.x? When I build the develop branch of this fork as follows with gfortran 8.2.0 and a recent OpenCoarrays commit, I get the following:

$ cd src/tests $ export COMPILER=gnu $ make USE_ASSERTIONS=.true. $ cafrun -n 4 ./test-ideal Number of images = 4 1 domain%initialize_from_file('input-parameters.txt') ximgs= 2 yimgs= 2 call master_initialize(this) call this%variable%initialize(this%get_grid_dimensions(),variable_test_val) Layer height Pressure Temperature Water Vapor [m] [hPa] [K] [kg/kg] 9750.00000 271.047180 206.509430 9.17085254E-06 7750.00000 364.236786 224.725372 7.91714992E-05 5750.00000 481.825287 243.449936 5.01311326E-04 3750.00000 628.424316 262.669800 2.46796501E-03 1750.00000 809.217651 282.372711 9.08217765E-03 ThompMP: read qr_acr_qg.dat instead of computing qr_acr_qg initialized: 0.229000002
ThompMP: read qr_acr_qs.dat instead of computing qr_acr_qs initialized: 0.170000002
ThompMP: read freezeH2O.dat instead of computing freezeH2O initialized: 1.02300000
qi_aut_qs initialized: 1.79999992E-02

Beginning simulation... Assertion "put_north: conformable halo_south_in and local " failed on image 1 ERROR STOP Assertion "put_south: conformable halo_north_in and local " failed on image 4 ERROR STOP Assertion "put_south: conformable halo_north_in and local " failed on image 3 ERROR STOP Assertion "put_north: conformable halo_south_in and local " failed on image 2 ERROR STOP

[proxy:0:0@Sourcery-Institute-VM] HYDU_sock_write (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/utils/sock/sock.c:294): write error (Broken pipe) [proxy:0:0@Sourcery-Institute-VM] HYD_pmcd_pmip_control_cmd_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:932): unable to write to downstream stdin [proxy:0:0@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip.c:202): demux engine error waiting for event [mpiexec@Sourcery-Institute-VM] control_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:208): assert (!closed) failed [mpiexec@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status [mpiexec@Sourcery-Institute-VM] HYD_pmci_wait_for_completion (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/ui/mpich/mpiexec.c:340): process manager error waiting for completion Error: Command: /opt/mpich/3.2.1/gnu/8.2.0/bin/mpiexec -n 4 --disable-auto-cleanup ./test-ideal failed to run. I've also attempted to build with the Intel 18 and 19 compilers on the pegasus.nic.uoregon.edu and got the following runtime messages after which execution hangs:

$ mpiexec -np 1 ./test-ideal [mpiexec@pegasus] HYDU_parse_hostfile (../../utils/args/args.c:553): unable to open host file: ./cafconfig.txt [mpiexec@pegasus] config_tune_fn (../../ui/mpich/utils.c:2192): error parsing config file [mpiexec@pegasus] match_arg (../../utils/args/args.c:243): match handler returned error [mpiexec@pegasus] HYDU_parse_array_single (../../utils/args/args.c:294): argument matching returned error [mpiexec@pegasus] HYD_uii_mpx_get_parameters (../../ui/mpich/utils.c:4999): error parsing input array

Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...

Global options (passed to all executables):

Global environment options: -genv {name} {value} environment variable name and value -genvlist {env1,env2,...} environment variable list to pass -genvnone do not pass any environment variables -genvall pass all environment variables not managed by the launcher (default)

Other global options: -f {name} | -hostfile {name} file containing the host names -hosts {host list} comma separated host list -configfile {name} config file containing MPMD launch options -machine {name} | -machinefile {name} file mapping procs to machines -pmi-connect {nocache|lazy-cache|cache} set the PMI connections mode to use -pmi-aggregate aggregate PMI messages -pmi-noaggregate do not aggregate PMI messages -trace {} trace the application using profiling library; default is libVT.so -trace-imbalance {} trace the application using imbalance profiling library; default is libVTim.so -check-mpi {} check the application using checking library; default is libVTmc.so -ilp64 Preload ilp64 wrapper library for support default size of integer 8 bytes -mps start statistics gathering for MPI Performance Snapshot (MPS) -aps start statistics gathering for Application Performance Snapshot (APS) -trace-pt2pt collect information about Point to Point operations -trace-collectives collect information about Collective operations -tune [] apply the tuned data produced by the MPI Tuner utility -use-app-topology perform optimized rank placement based statistics and cluster topology -noconf do not use any mpiexec's configuration files -branch-count {leaves_num} set the number of children in tree -gwdir {dirname} working directory to use -gpath {dirname} path to executable to use -gumask {umask} mask to perform umask -tmpdir {tmpdir} temporary directory for cleanup input file -cleanup create input file for clean up -gtool {options} apply a tool over the mpi application -gtoolfile {file} apply a tool over the mpi application. Parameters specified in the file

Local options (passed to individual executables):

Local environment options: -env {name} {value} environment variable name and value -envlist {env1,env2,...} environment variable list to pass -envnone do not pass any environment variables -envall pass all environment variables (default)

Other local options: -host {hostname} host on which processes are to be run -hostos {OS name} operating system on particular host -wdir {dirname} working directory to use -path {dirname} path to executable to use -umask {umask} mask to perform umask -n/-np {value} number of processes {exec_name} {args} executable name and arguments

Hydra specific options (treated as global):

Bootstrap options: -bootstrap bootstrap server to use (ssh rsh pdsh fork slurm srun ll llspawn.stdio lsf blaunch sge qrsh persist service pbsdsh) -bootstrap-exec executable to use to bootstrap processes -bootstrap-exec-args additional options to pass to bootstrap server -prefork use pre-fork processes startup method -enable-x/-disable-x enable or disable X forwarding

Resource management kernel options: -rmk resource management kernel to use (user slurm srun ll llspawn.stdio lsf blaunch sge qrsh pbs cobalt)

Processor topology options: -binding process-to-core binding mode Extended fabric control options: -rdma select RDMA-capable network fabric (dapl). Fallback list is ofa,tcp,tmi,ofi -RDMA select RDMA-capable network fabric (dapl). Fallback is ofa -dapl select DAPL-capable network fabric. Fallback list is tcp,tmi,ofa,ofi -DAPL select DAPL-capable network fabric. No fallback fabric is used -ib select OFA-capable network fabric. Fallback list is dapl,tcp,tmi,ofi -IB select OFA-capable network fabric. No fallback fabric is used -tmi select TMI-capable network fabric. Fallback list is dapl,tcp,ofa,ofi -TMI select TMI-capable network fabric. No fallback fabric is used -mx select Myrinet MX network fabric. Fallback list is dapl,tcp,ofa,ofi -MX select Myrinet MX network fabric. No fallback fabric is used -psm select PSM-capable network fabric. Fallback list is dapl,tcp,ofa,ofi -PSM select PSM-capable network fabric. No fallback fabric is used -psm2 select Intel Omni-Path Fabric. Fallback list is dapl,tcp,ofa,ofi -PSM2 select Intel Omni-Path Fabric. No fallback fabric is used -ofi select OFI-capable network fabric. Fallback list is tmi,dapl,tcp,ofa -OFI select OFI-capable network fabric. No fallback fabric is used

Checkpoint/Restart options: -ckpoint {on|off} enable/disable checkpoints for this run -ckpoint-interval checkpoint interval -ckpoint-prefix destination for checkpoint files (stable storage, typically a cluster-wide file system) -ckpoint-tmp-prefix temporary/fast/local storage to speed up checkpoints -ckpoint-preserve number of checkpoints to keep (default: 1, i.e. keep only last checkpoint) -ckpointlib checkpointing library (blcr) -ckpoint-logfile checkpoint activity/status log file (appended) -restart restart previously checkpointed application -ckpoint-num checkpoint number to restart

Demux engine options: -demux demux engine (poll select)

Debugger support options: -tv run processes under TotalView -tva {pid} attach existing mpiexec process to TotalView -gdb run processes under GDB -gdba {pid} attach existing mpiexec process to GDB -gdb-ia run processes under Intel IA specific GDB

Other Hydra options: -v | -verbose verbose mode -V | -version show the version -info build information -print-rank-map print rank mapping -print-all-exitcodes print exit codes of all processes -iface network interface to use -help show this message -perhost place consecutive processes on each host -ppn stand for "process per node"; an alias to -perhost -grr stand for "group round robin"; an alias to -perhost -rr involve "round robin" startup scheme -s redirect stdin to all or 1,2 or 2-4,6 MPI processes (0 by default) -ordered-output avoid data output intermingling -profile turn on internal profiling -l | -prepend-rank prepend rank to output -prepend-pattern prepend pattern to output -outfile-pattern direct stdout to file -errfile-pattern direct stderr to file -localhost local hostname for the launching node -nolocal avoid running the application processes on the node where mpiexec.hydra started

Intel(R) MPI Library for Linux* OS, Version 2018 Update 3 Build 20180411 (id: 18329) Copyright 2003-2018 Intel Corporation. ^C[mpiexec@pegasus] Sending Ctrl-C to processes as requested [mpiexec@pegasus] Press Ctrl-C again to force abort — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

scrasmussen commented 5 years ago

If I remember correctly I was never able to get it working with gfortran

6.x, that why we didn't use a newer version when we were running the tests for the paper. I'm double checking that now but I think it's correct.

On Tue, Oct 30, 2018 at 4:27 AM Ethan Gutmann notifications@github.com wrote:

That looks like the problem we were seeing with opencoarrays not knowing the shape of a multi-dimensional coarray correctly. I thought that was fixed, but maybe it was just identified. I don’t think I have ever gotten it to run with gfortran >6.x.

ifort is looking for a configuration file, you can find instructions on what that file should look like online. I think I got some instructions from Larry on an alternate way to compile using coarray=single but then launching with mpiexec that bypasses the config file. I’ll get that info for you tomorrow.

Ethan

On Oct 29, 2018, at 8:14 PM, Damian Rouson notifications@github.com wrote:

@gutmann @scrasmussen

Could you remind me whether coarray_icar works at all with gfortran versions > 6.x? When I build the develop branch of this fork as follows with gfortran 8.2.0 and a recent OpenCoarrays commit, I get the following:

$ cd src/tests $ export COMPILER=gnu $ make USE_ASSERTIONS=.true. $ cafrun -n 4 ./test-ideal Number of images = 4 1 domain%initialize_from_file('input-parameters.txt') ximgs= 2 yimgs= 2 call master_initialize(this) call this%variable%initialize(this%get_grid_dimensions(),variable_test_val) Layer height Pressure Temperature Water Vapor [m] [hPa] [K] [kg/kg] 9750.00000 271.047180 206.509430 9.17085254E-06 7750.00000 364.236786 224.725372 7.91714992E-05 5750.00000 481.825287 243.449936 5.01311326E-04 3750.00000 628.424316 262.669800 2.46796501E-03 1750.00000 809.217651 282.372711 9.08217765E-03 ThompMP: read qr_acr_qg.dat instead of computing qr_acr_qg initialized: 0.229000002 ThompMP: read qr_acr_qs.dat instead of computing qr_acr_qs initialized: 0.170000002 ThompMP: read freezeH2O.dat instead of computing freezeH2O initialized: 1.02300000 qi_aut_qs initialized: 1.79999992E-02

Beginning simulation... Assertion "put_north: conformable halo_south_in and local " failed on image 1 ERROR STOP Assertion "put_south: conformable halo_north_in and local " failed on image 4 ERROR STOP Assertion "put_south: conformable halo_north_in and local " failed on image 3 ERROR STOP Assertion "put_north: conformable halo_south_in and local " failed on image 2 ERROR STOP

[proxy:0:0@Sourcery-Institute-VM] HYDU_sock_write (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/utils/sock/sock.c:294): write error (Broken pipe) [proxy:0:0@Sourcery-Institute-VM] HYD_pmcd_pmip_control_cmd_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:932): unable to write to downstream stdin [proxy:0:0@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip.c:202): demux engine error waiting for event [mpiexec@Sourcery-Institute-VM] control_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:208): assert (!closed) failed [mpiexec@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status [mpiexec@Sourcery-Institute-VM] HYD_pmci_wait_for_completion (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event [mpiexec@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/ui/mpich/mpiexec.c:340): process manager error waiting for completion Error: Command: /opt/mpich/3.2.1/gnu/8.2.0/bin/mpiexec -n 4 --disable-auto-cleanup ./test-ideal failed to run. I've also attempted to build with the Intel 18 and 19 compilers on the pegasus.nic.uoregon.edu and got the following runtime messages after which execution hangs:

$ mpiexec -np 1 ./test-ideal [mpiexec@pegasus] HYDU_parse_hostfile (../../utils/args/args.c:553): unable to open host file: ./cafconfig.txt [mpiexec@pegasus] config_tune_fn (../../ui/mpich/utils.c:2192): error parsing config file [mpiexec@pegasus] match_arg (../../utils/args/args.c:243): match handler returned error [mpiexec@pegasus] HYDU_parse_array_single (../../utils/args/args.c:294): argument matching returned error [mpiexec@pegasus] HYD_uii_mpx_get_parameters (../../ui/mpich/utils.c:4999): error parsing input array

Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...

Global options (passed to all executables):

Global environment options: -genv {name} {value} environment variable name and value -genvlist {env1,env2,...} environment variable list to pass -genvnone do not pass any environment variables -genvall pass all environment variables not managed by the launcher (default)

Other global options: -f {name} | -hostfile {name} file containing the host names -hosts {host list} comma separated host list -configfile {name} config file containing MPMD launch options -machine {name} | -machinefile {name} file mapping procs to machines -pmi-connect {nocache|lazy-cache|cache} set the PMI connections mode to use -pmi-aggregate aggregate PMI messages -pmi-noaggregate do not aggregate PMI messages -trace {} trace the application using profiling library; default is libVT.so -trace-imbalance {} trace the application using

imbalance profiling library; default is libVTim.so -check-mpi {} check the application using checking library; default is libVTmc.so -ilp64 Preload ilp64 wrapper library for support default size of integer 8 bytes -mps start statistics gathering for MPI Performance Snapshot (MPS) -aps start statistics gathering for Application Performance Snapshot (APS) -trace-pt2pt collect information about Point to Point operations -trace-collectives collect information about Collective operations -tune [] apply the tuned data produced by the MPI Tuner utility -use-app-topology perform optimized rank placement based statistics and cluster topology -noconf do not use any mpiexec's configuration files -branch-count {leaves_num} set the number of children in tree -gwdir {dirname} working directory to use -gpath {dirname} path to executable to use -gumask {umask} mask to perform umask -tmpdir {tmpdir} temporary directory for cleanup input file -cleanup create input file for clean up -gtool {options} apply a tool over the mpi application -gtoolfile {file} apply a tool over the mpi application. Parameters specified in the file Local options (passed to individual executables): Local environment options: -env {name} {value} environment variable name and value -envlist {env1,env2,...} environment variable list to pass -envnone do not pass any environment variables -envall pass all environment variables (default) Other local options: -host {hostname} host on which processes are to be run -hostos {OS name} operating system on particular host -wdir {dirname} working directory to use -path {dirname} path to executable to use -umask {umask} mask to perform umask -n/-np {value} number of processes {exec_name} {args} executable name and arguments Hydra specific options (treated as global): Bootstrap options: -bootstrap bootstrap server to use (ssh rsh pdsh fork slurm srun ll llspawn.stdio lsf blaunch sge qrsh persist service pbsdsh) -bootstrap-exec executable to use to bootstrap processes -bootstrap-exec-args additional options to pass to bootstrap server -prefork use pre-fork processes startup method -enable-x/-disable-x enable or disable X forwarding Resource management kernel options: -rmk resource management kernel to use (user slurm srun ll llspawn.stdio lsf blaunch sge qrsh pbs cobalt) Processor topology options: -binding process-to-core binding mode Extended fabric control options: -rdma select RDMA-capable network fabric (dapl). Fallback list is ofa,tcp,tmi,ofi -RDMA select RDMA-capable network fabric (dapl). Fallback is ofa -dapl select DAPL-capable network fabric. Fallback list is tcp,tmi,ofa,ofi -DAPL select DAPL-capable network fabric. No fallback fabric is used -ib select OFA-capable network fabric. Fallback list is dapl,tcp,tmi,ofi -IB select OFA-capable network fabric. No fallback fabric is used -tmi select TMI-capable network fabric. Fallback list is dapl,tcp,ofa,ofi -TMI select TMI-capable network fabric. No fallback fabric is used -mx select Myrinet MX* network fabric. Fallback list is dapl,tcp,ofa,ofi -MX select Myrinet MX* network fabric. No fallback fabric is used -psm select PSM-capable network fabric. Fallback list is dapl,tcp,ofa,ofi -PSM select PSM-capable network fabric. No fallback fabric is used -psm2 select Intel* Omni-Path Fabric. Fallback list is dapl,tcp,ofa,ofi -PSM2 select Intel* Omni-Path Fabric. No fallback fabric is used -ofi select OFI-capable network fabric. Fallback list is tmi,dapl,tcp,ofa -OFI select OFI-capable network fabric. No fallback fabric is used Checkpoint/Restart options: -ckpoint {on|off} enable/disable checkpoints for this run -ckpoint-interval checkpoint interval -ckpoint-prefix destination for checkpoint files (stable storage, typically a cluster-wide file system) -ckpoint-tmp-prefix temporary/fast/local storage to speed up checkpoints -ckpoint-preserve number of checkpoints to keep (default: 1, i.e. keep only last checkpoint) -ckpointlib checkpointing library (blcr) -ckpoint-logfile checkpoint activity/status log file (appended) -restart restart previously checkpointed application -ckpoint-num checkpoint number to restart Demux engine options: -demux demux engine (poll select) Debugger support options: -tv run processes under TotalView -tva {pid} attach existing mpiexec process to TotalView -gdb run processes under GDB -gdba {pid} attach existing mpiexec process to GDB -gdb-ia run processes under Intel IA specific GDB Other Hydra options: -v | -verbose verbose mode -V | -version show the version -info build information -print-rank-map print rank mapping -print-all-exitcodes print exit codes of all processes -iface network interface to use -help show this message -perhost place consecutive processes on each host -ppn stand for "process per node"; an alias to -perhost -grr stand for "group round robin"; an alias to -perhost -rr involve "round robin" startup scheme -s redirect stdin to all or 1,2 or 2-4,6 MPI processes (0 by default) -ordered-output avoid data output intermingling -profile turn on internal profiling -l | -prepend-rank prepend rank to output -prepend-pattern prepend pattern to output -outfile-pattern direct stdout to file -errfile-pattern direct stderr to file -localhost local hostname for the launching node -nolocal avoid running the application processes on the node where mpiexec.hydra started Intel(R) MPI Library for Linux* OS, Version 2018 Update 3 Build 20180411 (id: 18329) Copyright 2003-2018 Intel Corporation. ^C[mpiexec@pegasus] Sending Ctrl-C to processes as requested [mpiexec@pegasus] Press Ctrl-C again to force abort — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gutmann/coarray_icar/issues/38#issuecomment-434167948, or mute the thread https://github.com/notifications/unsubscribe-auth/AFe_cgKDVggUIMT6OBwHkiVOaRkduIuTks5up9UsgaJpZM4YBCgK .

gutmann commented 5 years ago

try changing the ifort compile line to something like ifort -coarray=single and launch with something like mpiexec.hydra -n $NPROCESSORS -hostfile $HOSTFILE ./test-ideal

Any hope of getting the bug in opencoarrays fixed?

gutmann commented 5 years ago

... also, try just turning off assertions (with gfortran >6.x). I wonder if the bug at this point is just that the shape is not identified correctly, but if it just tries to pass an arbitrary chunk of data it will go into the right location since the shapes are guaranteed to be the same in our code.