nextsimhub / nextsimdg

neXtSIM_DG : next generation sea-ice model with DG
https://nextsim-dg.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
10 stars 13 forks source link

nextsim failing on Betzy in sequnetial and parallel both #497

Open monsieuralok opened 4 months ago

monsieuralok commented 4 months ago

I am compiling nextsim on Betzy using intel compiler; I am using latest develop branch and first, I am not able to build sequential code which used to be earlier in June/July/August-2023 and second, I am able to build parallel code. But, when I am executing binary using

_srun -n 1 ../build/nextsim --config-file config_simple_example.cfg --model.partitionfile partition.nc terminate called after throwing an instance of 'netCDF::exceptions::NcBadId' what(): NetCDF: Not a valid ID file: ncType.cpp line:138 srun: error: b3214: task 0: Aborted (core dumped) srun: Terminating StepId=812297.3

Also, cmake is not working with option _-DCMAKE_BUILDTYPE=Debug so, I am not able to debug.

I am using following version of netcdf and intel compiler intel/2021b, netCDF-C++4/4.3.1-iimpi-2021b, netcdf4-python/1.5.7-intel-2021b , Boost/1.77.0-intel-compilers-2021.4.0, Eigen/3.4.0-GCCcore-11.2.0 and CMake/3.22.1-GCCcore-11.2.0

Please provide yours suggestions.

timspainNERSC commented 4 months ago

Since it is a netCDF error, the most obvious culprit would be the netcdf C++ library. I am using 4.1.1 from Homebrew. @einola, what version do you have?

MarionBWeinzierl commented 4 months ago

Hm, strange about the Debug build not working, as it does for me. And also that the serial code does not build I am wondering whether this has to do with the Intel compiler.

@TomMelt has pushed a spack.yaml file which contains a set of specifications for the libraries which definitely works. (He also added information on how to use spack, if you are interested. Same is true for the Dockerfiles, information also available in the documentation.

Did you use the partition.nc file that I had pushed earlier and then removed, or did you create it yourself from the cdl file using ncgen? And did you rerun python3 make_init.py to create init_rect30x30.nc ?

monsieuralok commented 4 months ago

@MarionBWeinzierl @timspainNERSC I am able to execute it with GNU and OpenMPI library after creating new init_rect30x30.nc. Most probably, our system is having bad installation of netcdf4/netcdf4-python based on intel. I have raised the issue here with sysadmin.

TomMelt commented 4 months ago

Hi @monsieuralok , could you please provide the command run and the error output for trying to build in Debug mode.

I am concerned there may be an issue with your environment if the code is capable of building in Release but not Debug.