yogevb / a-dda

Automatically exported from code.google.com/p/a-dda
0 stars 0 forks source link

Error from adda_mpi #127

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Create machiensefile for mpich2 with localhost
2.Run adda_mpi
3.Unplug network cable

What is the expected output? What do you see instead?
Uninterruped execution
At the end of execution there's this error:
Fatal error in MPI_Finalize: Other MPI error, error stack: 
MPI_Finalize(281).....: MPI_Finalize failed 
MPI_Finalize(209).....:  
MPID_Finalize(131)....:  
MPIDI_PG_Finalize(106): PMI_Finalize failed, error -1 

What version of the product are you using? On what operating system?
adda_mpi: 1.1a
Win32 SP3 + latest patches
2-core CPU

Please provide any additional information below.
This seems to happen whenever there's a change in the network configuration, 
e.g., network cable is plugged or unplugged, not immediately, but at the end of 
adda_mpi execution.
However, I'm running adda_mpi on my local machine alone.
I noticed this behavior with nearfield too, but after I changed the definition 
of the host to localhost (127.0.0.1), the error had stopped.
Nevertheless, I still see this error with adda_mpi (v.1.1a) 

Original issue reported on code.google.com by qwer1...@gmail.com on 10 Feb 2011 at 9:20

GoogleCodeExporter commented 9 years ago
I guess the error occurs at the end of ADDA execution, and all the simulation 
results are produced normally. Is it so? So the error happens when MPI_Finalize 
is called immediately before exit.

I have tried to reproduce your problem, using MPICH2 1.3.1, but could not get 
any errors. I ran several times
mpiexec -machinefile mf -n 2 adda_mpi -grid 64 -m 1.2 0
I've tried two versions of file mf: "localhost:2" and "127.0.0.1:2"
I unplugged the network cable between the half and end of the simulation

Overall, it seems like an error of MPI implementation (MPICH2), which should 
ignore the problems with network if it is not really relevant. I noticed before 
that MPICH2 does use network even when running locally. For example, Windows 
firewall produces warning asking that mpiexec is trying to access the network. 
I am not sure whether this is a bug or not, but it may at least explain the 
errors.

The problem may be sensitive to the particular version of MPICH2. So here I 
attach the latest adda_mpi, compiled linking to MPICH2 1.3.1. Also I think that 
the most important is the version of MPICH2 installed at machine where this 
program is run.

Actually, when I run mpiexec on my laptop locally, I do not use any machinefile 
at all. So you may try to run 
mpiexec -n 2 adda_mpi ...
to see if there is any difference. Another option is
mpiexec -localonly 2 adda_mpi ...
which should force MPICH2 to use only local resources.

Original comment by yurkin on 13 Feb 2011 at 5:22

Attachments:

GoogleCodeExporter commented 9 years ago
We got no replication of these issue, so there seems nothing to fix.

Original comment by yurkin on 10 Jun 2011 at 2:03