ornladios / ADIOS

The old ADIOS 1.x code repository. Look for ADIOS2 for new repo
https://csmd.ornl.gov/adios
Other
54 stars 41 forks source link

Parallel build issue in 1.7.0 #30

Closed QuLogic closed 10 years ago

QuLogic commented 10 years ago

I'm building 1.7.0 using configure (not CMake), and parallel builds (make -j8) will sporadically fail with the following error:

make[3]: Entering directory `/home/elliott/code/adios-1.7.0/tests/genarray'
rm -f gwrite_genarray.fh gread_genarray.fh
../../utils/gpp/gpp.py ./genarray3d.xml
mpif90 -DHAVE_CONFIG_H -I. -I../..  -I../../src    -g -O2 -c -o genarray2D-genarray2D.o `test -f 'genarray2D.F90' || echo './'`genarray2D.F90
rm -f gwrite_genarray.fh gread_genarray.fh
mpif90 -DHAVE_CONFIG_H -I. -I../..  -I../../src    -g -O2 -c -o copyarray2D-copyarray2D.o `test -f 'copyarray2D.F90' || echo './'`copyarray2D.F90
../../utils/gpp/gpp.py ./genarray3d.xml
test "." = "." || cp ./genarray3d.xml ./genarray.xml .
mpif90 -DHAVE_CONFIG_H -I. -I../..  -I../../src    -g -O2 -c -o genarray-genarray.o `test -f 'genarray.F90' || echo './'`genarray.F90
mpif90 -DHAVE_CONFIG_H -I. -I../..  -I../../src    -g -O2 -c -o copyarray-copyarray.o `test -f 'copyarray.F90' || echo './'`copyarray.F90
copyarray.F90:209.18:

    cache_end_time = MPI_WTIME()
                  1
Error: Symbol 'cache_end_time' at (1) has no IMPLICIT type
copyarray.F90:200.20:

    cache_start_time = MPI_WTIME()
                    1
Error: Symbol 'cache_start_time' at (1) has no IMPLICIT type
copyarray.F90:210.20:

    cache_total_time = cache_end_time - cache_start_time
                    1
Error: Symbol 'cache_total_time' at (1) has no IMPLICIT type
copyarray.F90:239.18:

    cache_end_time = MPI_WTIME()
                  1
Error: Symbol 'cache_end_time' at (1) has no IMPLICIT type
copyarray.F90:233.20:

    cache_start_time = MPI_WTIME()
                    1
Error: Symbol 'cache_start_time' at (1) has no IMPLICIT type
copyarray.F90:240.20:

    cache_total_time = cache_end_time - cache_start_time
                    1
Error: Symbol 'cache_total_time' at (1) has no IMPLICIT type
make[3]: *** [copyarray-copyarray.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `/home/elliott/code/adios-1.7.0/tests/genarray'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/elliott/code/adios-1.7.0/tests'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/elliott/code/adios-1.7.0'
make: *** [all] Error 2

A subsequent build with serial make completes fine. This is usually a clear sign that some dependency is missing from the Makefile.

pnorbert commented 10 years ago

So far no one was able to figure out how to define dependencies for Fortran90 source files with modules in them so that parallel make would work. We would be glad to learn about a solution. On Oct 22, 2014 11:14 PM, "Elliott Sales de Andrade" < notifications@github.com> wrote:

I'm building 1.7.0 using configure (not CMake), and parallel builds (make -j8) will sporadically fail with the following error:

make[3]: Entering directory /home/elliott/code/adios-1.7.0/tests/genarray' rm -f gwrite_genarray.fh gread_genarray.fh ../../utils/gpp/gpp.py ./genarray3d.xml mpif90 -DHAVE_CONFIG_H -I. -I../.. -I../../src -g -O2 -c -o genarray2D-genarray2D.otest -f 'genarray2D.F90' || echo './'genarray2D.F90 rm -f gwrite_genarray.fh gread_genarray.fh mpif90 -DHAVE_CONFIG_H -I. -I../.. -I../../src -g -O2 -c -o copyarray2D-copyarray2D.otest -f 'copyarray2D.F90' || echo './'copyarray2D.F90 ../../utils/gpp/gpp.py ./genarray3d.xml test "." = "." || cp ./genarray3d.xml ./genarray.xml . mpif90 -DHAVE_CONFIG_H -I. -I../.. -I../../src -g -O2 -c -o genarray-genarray.otest -f 'genarray.F90' || echo './'genarray.F90 mpif90 -DHAVE_CONFIG_H -I. -I../.. -I../../src -g -O2 -c -o copyarray-copyarray.otest -f 'copyarray.F90' || echo './'`copyarray.F90 copyarray.F90:209.18:

cache_end_time = MPI_WTIME()
              1

Error: Symbol 'cache_end_time' at (1) has no IMPLICIT type copyarray.F90:200.20:

cache_start_time = MPI_WTIME()
                1

Error: Symbol 'cache_start_time' at (1) has no IMPLICIT type copyarray.F90:210.20:

cache_total_time = cache_end_time - cache_start_time
                1

Error: Symbol 'cache_total_time' at (1) has no IMPLICIT type copyarray.F90:239.18:

cache_end_time = MPI_WTIME()
              1

Error: Symbol 'cache_end_time' at (1) has no IMPLICIT type copyarray.F90:233.20:

cache_start_time = MPI_WTIME()
                1

Error: Symbol 'cache_start_time' at (1) has no IMPLICIT type copyarray.F90:240.20:

cache_total_time = cache_end_time - cache_start_time
                1

Error: Symbol 'cache_total_time' at (1) has no IMPLICIT type make[3]: * [copyarray-copyarray.o] Error 1 make[3]: * Waiting for unfinished jobs.... make[3]: Leaving directory /home/elliott/code/adios-1.7.0/tests/genarray' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory/home/elliott/code/adios-1.7.0/tests' make[1]: * [all-recursive] Error 1 make[1]: Leaving directory `/home/elliott/code/adios-1.7.0' make: * [all] Error 2

A subsequent build with serial make completes fine. This is usually a clear sign that some dependency is missing from the Makefile.

— Reply to this email directly or view it on GitHub https://github.com/ornladios/ADIOS/issues/30.

QuLogic commented 10 years ago

The general method is to add dependencies of one file on another object file. It doesn't work if you add a dependency on a module file, because Make doesn't really know about them and the situations under which compilers change or update them are somewhat inconsistent.

We do this quite a bit in specfem3d_globe (though without automake) and it works quite fine.

QuLogic commented 10 years ago

Actually, I looked a little closer at the files in the tests/genarray directory. I have to wonder if the problem has anything to do with dependencies now. It looks like copyarray.F90 and genarray.F90 both contain the definition for the same module, but with different contents. This is pretty fragile since many compilers save cached versions of the module in files named after the module itself. Since it doesn't appear that these files are even compiled into the same executable, I don't think they need to have the same module names there...

QuLogic commented 10 years ago

Looks like the name was fixed in master at ac9b5436d38d5c48302a4bfafe3cc1fdbfde3a41. For me, this seems to be sufficient, though you may be aware of other cases (where I just don't have a parallel-enough compile to trigger it).

However, I'll just point out that the program name is also the same (makes no difference for compile, I think; just a consistency thing) and the modules need to be added to CLEANFILES in Makefile.am

QuLogic commented 10 years ago

Sigh, of course, after posting that I ran into the adios_*_mod.mod issues...