Closed auerl closed 9 years ago
No, I did not encounter it so far, even though the intermediate results are not the most stable part of the code. Would you mind to post your inparam_basic here?
Here is my inparam_basic. I am running the code with intel 14.0.2, netcdf-4.3.3 (compiled with ifort), openmpi-1.8.4 OR gcc-4.8.2, netcdf-4.1.3, and the default gcc version of openmpi on the machine in Zurich. It worked (at least with gnu) before the latest changes in the intermediate-result parts of the code. The AxiSEM version is the one from the official repository ~mid of last week.
# directory of the forward and backward run
#FWD_DIR './wavefield_50s/fwd/'
#BWD_DIR './wavefield_50s/bwd/'
FWD_DIR './wavefield_20s/fwd/'
BWD_DIR './wavefield_20s/bwd/'
# Paths of parameter files
SOURCE_FILE 'CMTSOLUTION'
RECEIVER_FILE 'receiver.dat'
FILTER_FILE 'filters.dat'
# Select the mesh file type. Allowed values are
# abaqus : .inp file, can be generated with Qubit or other codes. Can
# contain various geometries and multiple sub-objects
# Supported geometries: tetrahedra, triangles, quadrilaterals
# Set file name in MESH_FILE_ABAQUS
#
# tetrahedral : tetrahedral mesh in two separate files with
# 1. coordinates of the vertices (MESH_FILE_VERTICES)
# 2. the connectivity of the facets of the tetrahedrons
# (MESH_FILE_FACETS)
MESH_FILE_TYPE 'abaqus'
MESH_FILE_ABAQUS 'unit_tests/flat_triangles.inp'
#MESH_FILE_ABAQUS 'unit_tests/vox_15l_5deg_test.dat'
#MESH_FILE_TYPE 'tetrahedral'
#MESH_FILE_VERTICES 'unit_tests/vertices.TEST'
#MESH_FILE_FACETS 'unit_tests/facets.TEST'
# Prefix of output file names.
# Kernel files are called $OUTPUT_FILE_kernel.xdmf
# Wavefield movies are called $OUTPUT_FILE_wavefield.xdmf
OUTPUT_FILE 'kerner'
# Output format when dumping kernels and wavefields.
# Choose between xdmf, Yale-style csr binary format (compressed sparse row) and
# ascii.
# Yet, the allowed error below is assumed as the truncation threshold in
# csr and ascii storage
DUMP_TYPE 'xdmf'
# Write out seismograms? (default: true)
# Seismograms (raw full trace, filtered full trace and cut trace) can be
# written out. Produces three files per kernel. Disable to avoid congesting
# your rundir.
WRITE_SEISMOGRAMS true
# Monte Carlo integration
# Absolute and relative error limits can be defined separately. The convergence
# conditions are connected by OR
# Allowed absolute error per cell
ALLOWED_ERROR 1e-4
# Allowed relative error per cell
ALLOWED_RELATIVE_ERROR 2e-2
# Number of points on which the kernel should be evaluated per MC iteration
POINTS_PER_MC_STEP 20
# Maximum number of iterations after which to cancel Monte Carlo integration
# in one cell, regardless of error.
MAXIMUM_ITERATIONS 1 #100
# Write detailed convergence of elements (default: false)
# Every slave writes out the values of all the kernels and their respective
# estimated errors into his OUTPUT_??? file after each MC step. This can lead
# to huge ASCII files (>1GB) with inane line lengths (approx. 20 x nkernel).
# However, it might be interesting to study the convergence behaviour.
# When set to false, only one summary per cell is written out.
WRITE_DETAILED_CONVERGENCE false
# Size of buffers for strain and displacement.
# - fullfields: only strain buffer is used for chunkwise IO
# - displ_only: displacement buffer is used for chunkwise IO and strain buffer contains
# the strain in the GLL basis for whole elements
STRAIN_BUFFER_SIZE 1000
DISPL_BUFFER_SIZE 100
# Number of elements in each MPI task.
ELEMENTS_PER_TASK 10 #100
# Use quasirandom numbers instead of pseudorandom ones
USE_QUASIRANDOM_NUMBERS true
# Integration scheme
# Options:
# parseval: FFT seismogram and convolved wavefield and use Parseval's Theorem
# then trapezoidal rule is used in frequency domain
# trapezoidal: Use the trapezoidal rule in time domain
INTEGRATION_SCHEME parseval
# FFTW Planning to use
# Options:
# ESTIMATE: Use heuristic to find best FFT plan
# MEASURE: Compute several test FFTs to find best plan (default)
# PATIENT: Compute a lot of test FFTs to find best plan
# EXHAUSTIVE: Compute an awful amount of test FFTs to find best plan
# for a detailed explanation: http://www.fftw.org/doc/Planner-Flags.html
FFTW_PLAN MEASURE
# Do you want to calculate a kernel or just plot wavefields?
# integratekernel has to be run with MPI and at least two processors
WHAT_TO_DO 'integratekernel'
# plot_wavefield has to be run in serial
#WHAT_TO_DO 'plot_wavefield'
# Do you want your kernels to be given on the vertices ('onvertices') or
# inside ('volumetric') each elements?
INT_TYPE 'volumetric'
Can't reproduce this problem with gcc 4.8.2 and the NetCDF version of the OS.
Hmm, strange. Can you point me to the folder on our machine where the kerner version is that works for you, as well as the wavefields you use. This should help to figure out what is going on. Cheers, L.
Wait, I can reproduce it now. Trying to fence it in
Okay, check, whether ebc5037d70cafffd97e930a66b6393e40a4caf74 fixes it.
If the rundir already contained a intermediate_results.nc, it was not deleted and it might have contained variables with different sizes.
Thanks! Now it works with gfortran! With ifort it cannot work since the compiler doesn't support "execute_command_line". On a side note, there doesn't seem to be a significant performance difference between the version of ifort I use, and gcc 4.8.2. I remember that you mentioned a large speedup when compiling with ifort on SuperMUC, and I think we interpreted this as the "large-value" issue going away when using Intel. So maybe this still needs to be taken care of via some intermittent manipulation of the numbers during calculation.
It seems that still something is messed up in the output netcdf file. Here are 20s P-wave kernels computed with the latest and last weeks version of the code (input wavefields, inparam file and compiler versions are the same):
Yes, that is an unrelated issue that I noticed with Kasra last week. I was hoping that it was just a problem of his settings, but it is more general. You may have also noticed that there is also only one kernel shown in paraview.
The cause seems to be the new output variable computation_time, which is scalar and somehow messes up the 1D-variables like kernel. I am on it!
Unfortunately, not fixed yet by 8299e5bf4740d1b487a47f2548b492c3897d3116 and later
The latest version of the code crashes in master_queue.f90 after "Create file for intermediate results" at the location
call nc_putvar_by_name(ncid = ncid_intermediate, &
varname = 'K_x', &
values = real(K_x, kind=sp) )
With the error message:
ERROR: CPU 0 could not write 2D variable: 'K_x'( 1) in NCID 65536 start ( 1) + count( 7260) is larger than size ( 100) of dimension K_x_1 (1)
This happens both with triangular meshes, and voxel meshes (I haven't tried tetrahedral ones). Interestingly, the code manages to pass this place, when compiling it with the intel fortran compilers, With intel, however, the code crashes with a
*\ Error in `./kerner': double free or corruption (!prev): 0x BB5 forrtl: error (78): process killed (SIGTERM)
after "Write mesh partition and convergence to disk" in master_queue.f90
Does anyone else encounter this issue?