Closed SalahKouhen closed 1 year ago
Hi Salah,
Okay! With parallel netcdf enabled, we should be able to get this working!
Could you attach the system.mk
file that you're using too?
Thanks!
My suspicion is that we'll need to add -I/network/software/ubuntu_bionic/netcdf/netcdf-c-4.9.0-parallel/include -I/network/software/ubuntu_bionic/hdf5/1.12.2-intel-parallel/include
to INC_DIR
and -L/network/software/ubuntu_bionic/netcdf/netcdf-c-4.9.0-parallel/lib -L/network/software/ubuntu_bionic/hdf5/1.12.2-intel-parallel/lib
to LIB_DIR
, as well as listing all of -lhdf5_hl -lhdf5 -lm -lz -lsz -lbz2 -lxml2 -lcurl
in LINKS
Hi Ben,
Thanks for the advice! I feel it is almost there.
This is the system.mk after I tried your suggestions (I probably did something wrong as I get issues running the basic tutorial):
# The following modules were last used
# openmpi/2.0.1/b1
# hdf5/1.8.19/b1
# netcdf/4.3.3.1
# gcc/8.2.0/b1
# fftw3/3.3.6/b1
# Specify compilers
CXX ?= g++
MPICXX ?= mpicxx
# Linking flags for netcdf
LINKS:=-lnetcdf -lhdf5_hl -lhdf5 -lz -lcurl -fopenmp -lm -lsz -lbz2 -lxml2
# Default compiler flags
CFLAGS:=-Wall -std=c++14
# Debug flags
DEBUG_FLAGS:=-g
DEBUG_LDFLAGS:=-g
# Basic optimization flags
OPT_FLAGS:=-O3
# Extra optimization flags (intel inter-process optimizations)
EXTRA_OPT_FLAGS:=
# Specify optimization flags for ALGLIB
ALGLIB_OPT_FLAGS:=-O3
# Modules are automatically on lib dir
NETCDF_LIBS=`nc-config --cxx4libs` -L/network/software/ubuntu_bionic/netcdf/netcdf-c-4.9.0-parallel/lib -L/network/software/ubuntu_bionic/hdf5/1.12.2-intel-parallel/lib
NETCDF_INCS=`nc-config --cxx4flags` -I/network/software/ubuntu_bionic/netcdf/netcdf-c-4.9.0-parallel/include -I/network/software/ubuntu_bionic/hdf5/1.12.2-intel-parallel/include
LIB_DIRS:=${NETCDF_LIBS}
INC_DIRS:=${NETCDF_INCS}
When I compiled using this the output was: compileOut.txt
The issues were:
icpc: command line warning #10148: option '-Wdate-time' not supported
ld: warning: libmpi.so.40, needed by /network/software/ubuntu_bionic/hdf5/1.12.2-intel-parallel/lib/libhdf5_hl.so, may conflict with libmpi.so.20
I then generated the data in the basic tutorial and compiled with the recommended constants.
When I ran ./coarse_grain.x --input_file velocity_sample.nc --filter_scales "1e3 15e3 50e3 100e3"
I got:
Commandline flag "--input_file" got value "velocity_sample.nc"
Commandline flag "--time" received no value - will use default "time"
Commandline flag "--depth" received no value - will use default "depth"
Commandline flag "--latitude" received no value - will use default "latitude"
Commandline flag "--longitude" received no value - will use default "longitude"
Commandline flag "--is_degrees" received no value - will use default "true"
Commandline flag "--Nprocs_in_time" received no value - will use default "1"
Commandline flag "--Nprocs_in_depth" received no value - will use default "1"
Commandline flag "--zonal_vel" received no value - will use default "uo"
Commandline flag "--merid_vel" received no value - will use default "vo"
Commandline flag "--region_definitions_file" received no value - will use default "region_definitions.nc"
Commandline flag "--region_definitions_dim" received no value - will use default "region"
Commandline flag "--region_definitions_var" received no value - will use default "region_definition"
Commandline flag "--filter_scales" got value "1e3 15e3 50e3 100e3"
Filter scales (4) are: 1km, 15km, 50km, 100km,
Compiled at 09:25:20 on Jan 19 2023.
Version 3.1.1
Using Cartesian coordinates.
coarse_grain.x: NETCDF_IO/read_var_from_file.cpp:85: void read_var_from_file(std::vector<double, std::allocator<double>> &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &, std::vector<bool, std::allocator<bool>> *, std::vector<int, std::allocator<int>> *, std::vector<int, std::allocator<int>> *, int, int, bool, int, double, ompi_communicator_t *): Assertion `input_nc_format == NC_FORMAT_NETCDF4' failed.
[atmlxint2:39184] *** Process received signal ***
[atmlxint2:39184] Signal: Aborted (6)
[atmlxint2:39184] Signal code: (-6)
[atmlxint2:39184] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f4638f9d980]
[atmlxint2:39184] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f4638bd8e87]
[atmlxint2:39184] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f4638bda7f1]
[atmlxint2:39184] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x303fa)[0x7f4638bca3fa]
[atmlxint2:39184] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30472)[0x7f4638bca472]
[atmlxint2:39184] [ 5] ./coarse_grain.x[0x421e9f]
[atmlxint2:39184] [ 6] ./coarse_grain.x[0x46266b]
[atmlxint2:39184] [ 7] ./coarse_grain.x[0x4b679a]
[atmlxint2:39184] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4638bbbc87]
[atmlxint2:39184] [ 9] ./coarse_grain.x[0x4051aa]
[atmlxint2:39184] *** End of error message ***
Aborted (core dumped)
So something is still not right! Once again, thanks for your help so far.
All the best, Salah
It compiles! Progress!
This error (Assertion
input_nc_format == NC_FORMAT_NETCDF4' failed.`) is surprising for the Tutorial. It comes up sometimes with various model data that provides outputs using older netcdf formats. We make the input file ourselves in the tutorial though, and explicitly tell it to use netcdf4.
Can you call ncdump -k velocity_sample.nc
? It should return netCDF-4
.
Also, while we're at it, in python, could you call import netCDF4; netCDF4.__version__
?
ncdump -k velocity_sample.nc
returned netCDF-4
The netcdf version in my python environment is '1.6.0'.
I conda updated netcdf4 but nothing changed.
Salah
Hmm, curious.
Since you're using conda, there's an environment.yml
file in the main Tutorial directory ( it's a recent addition, so you might need to git pull ). Could you try running the basic tutorial using a conda environment built from that?
Hi Ben,
Still no luck:
conda env create -f environment.yml
python generate_data.py
./coarse_grain.x --input_file velocity_sample.nc --filter_scales "1e3 15e3 50e3 100e3"
Assertion input_nc_format == NC_FORMAT_NETCDF4 failed.
Hi Salah,
I just sent an email to your physics.ox.ac.uk address with the velocity_sample.nc
data file that I get when I run the tutorial. Out of paranoia / to try and narrow down where things are going awry, can you try running the tutorial with that file?
We now have a working running version of FlowSieve on Jasmin! Closing the issue :-)
In case anyone digs through this in the future trying to solve a similar problem, the solution was to remove the nc-config --cxx4libs
and nc-config --cxx4flags
parts from the NETCDF_LIBS
and NETCDF_INCS
variables.
I suspect the issue was related to the -lnetcdf
link flag that was getting passed in too early in the compile list as a result.
Hi Ben,
I'm still trying 😄. I recieved some help from the university IT team and now parallel is enabled (https://github.com/husseinaluie/FlowSieve/issues/20). I have the output of
nc-config --all
at the bottom in case it is helpful.Unfortunately, I now run into this error:
... icpc: command line warning #10148: option '-Wdate-time' not supported ld: cannot find -lhdf5_hl ld: cannot find -lhdf5 Makefile:208: recipe for target 'Case_Files/coarse_grain.x' failed make: *** [Case_Files/coarse_grain.x] Error 1
I did:
make clean
module load intel-compilers/2022 module load openmpi/4.1.4-intel module load hdf5/1.12.2-intel-parallel module load netcdf/netcdf-c-4.9.0-parallel
make Case_Files/coarse_grain.x
Any help would be deeply appreciated!
All the best, Salah