uDALES / u-dales

uDALES: large-eddy-simulation software for urban flow, dispersion and microclimate modelling
https://udales.github.io/u-dales
GNU General Public License v3.0
47 stars 17 forks source link

macOS runtime fails for case 502 #131

Closed dmey closed 3 weeks ago

dmey commented 3 years ago

@samoliverowens I have re-enabled tests for macOS but noticed that now we get failures for 502. When you have a moment, would you be able to investigate this?

https://github.com/uDALES/u-dales/runs/1684489670?check_suite_focus=true#step:6:8700

samoliverowens commented 3 years ago

I can run 502 ok on my mac, not sure what is going on there - is this test running on two cores?

dmey commented 3 years ago

Is this with the tests provided in https://github.com/uDALES/u-dales/tree/master/tests? I just wanted to make sure we are using the same files. If not could you do:

# Assumes you set up the conda environment as per https://github.com/uDALES/u-dales/tree/master/tests
conda activate udales
git checkout dmey/ci-fixes
python tests/run_tests.py master dmey/ci-fixes Debug
samoliverowens commented 3 years ago

It doesn't build the executable - could this be the error I had before with the fortran compiler version?

dmey commented 3 years ago

The executable builds fine and all cases are run fine except for 502 -- this is a runtime error see https://github.com/uDALES/u-dales/runs/1684489670?check_suite_focus=true#step:6:8700

At line 99 of file /Users/runner/work/u-dales/u-dales/src/modfielddump.f90
samoliverowens commented 3 years ago

Yes but on my mac it doesn't build the executable

samoliverowens commented 3 years ago

I don't know what's going on here, it's a segmentation fault and the backtrace gives nothing:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x10a1828cd
#1  0x10a181cdd
#2  0x7fff70d9f5fc
#3  0x7fff70d62ece
#4  0x7fff70d6203b
#5  0x7fff70d61d8a
#6  0x1099ca5a2
#7  0x1099d84c0
#8  0x109c36f97
#9  0x109c37302
dmey commented 3 years ago

Yeah the missing back trace is usual GNU on macOS nonsense! And don't know about your macOS -- it looks for standard configuration so you may need to modify the build command depending on what the actual error is...

samoliverowens commented 3 years ago

How would I go about making sure it's using gcc-9? I've tried export FC=gfortran-9 but then running python3 tests/run_tests.py master dmey/ci-fixes Debug gives:

Switched to branch 'master'
Your branch is up to date with 'origin/master'.
-- The Fortran compiler identification is unknown
CMake Error at CMakeLists.txt:19 (project):
  The CMAKE_Fortran_COMPILER:

    /usr/local/bin/gfortran

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "FC" or the CMake cache entry CMAKE_Fortran_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.
dmey commented 3 years ago

Do the following once.

brew unlink gcc
brew link gcc@9

then export export FC=gfortran-9 every time you open a new shell or add it to your profile file.

samoliverowens commented 3 years ago

Have already tried that, no success.

dmey commented 3 years ago

what do you get when you do whereis gfortran-9? OK can you try changing the following line in tests/scripts/build_model.py:

    subprocess.run(
        ['cmake', f'-DCMAKE_BUILD_TYPE={build_type}', path_to_proj_dir, '-LA'], cwd=path_to_build_dir)

to:

    subprocess.run(
        ['FC=gfortran-9', 'cmake', f'-DCMAKE_BUILD_TYPE={build_type}', path_to_proj_dir, '-LA'], cwd=path_to_build_dir)
samoliverowens commented 3 years ago

whereis gfortran-9 returns nothing, and I tried doing that but it returns this:

M   tests/scripts/build_model.py
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
Traceback (most recent call last):
  File "tests/run_tests.py", line 142, in <module>
    parser.add_argument('build_type', help='TODO')
  File "tests/run_tests.py", line 58, in main
    path_to_exe = build_model.build_from_branch(
  File "/Users/samowens/u-dales-testing/u-dales/tests/scripts/build_model.py", line 34, in build_from_branch
    build(path_to_proj_dir, path_to_build_dir, build_type, clean_build_dir=False)
  File "/Users/samowens/u-dales-testing/u-dales/tests/scripts/build_model.py", line 44, in build
    subprocess.run(
  File "/usr/local/anaconda3/envs/udales/lib/python3.8/subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/anaconda3/envs/udales/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/anaconda3/envs/udales/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'FC=gfortran-9'
dmey commented 3 years ago

OK one step at a time then -- from the u-dales repo:

mkdir -p build/test
cd build/test
FC=gfortran-9 cmake -LA ../..
samoliverowens commented 3 years ago

Yeah the normal way works fine - make then builds the executable successfully

dmey commented 3 years ago

Can you please add the following 2 lines before the unmodified subprocess call and send me the vars --i.e.

    import os
    print(os.environ)

    subprocess.run(
        ['cmake', f'-DCMAKE_BUILD_TYPE={build_type}', path_to_proj_dir, '-LA'], cwd=path_to_build_dir)
samoliverowens commented 3 years ago

Here's the output:

environ({'TERM_SESSION_ID': 'w0t0p0:61DE275F-5AB7-439D-B3AC-562ACCA2E1EB', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.HUdepqYGuc/Listeners', 'LC_TERMINAL_VERSION': '3.3.9', 'COLORFGBG': '15;0', 'ITERM_PROFILE': 'Default', 'SQLITE_EXEMPT_PATH_FROM_VNODE_GUARDS': '/Users/samowens/Library/WebKit/Databases', 'XPC_FLAGS': '0x0', 'LANG': 'en_GB.UTF-8', 'PWD': '/Users/samowens/u-dales-testing/u-dales', 'SHELL': '/bin/zsh', 'SECURITYSESSIONID': '186a6', 'TERM_PROGRAM_VERSION': '3.3.9', 'TERM_PROGRAM': 'iTerm.app', 'PATH': '/usr/local/anaconda3/envs/udales/bin:/usr/local/anaconda3/bin:/Users/samowens/bin:/usr/local/bin:/usr/local/anaconda3/bin:/usr/local/anaconda3/condabin:/Users/samowens/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin', 'DISPLAY': '/private/tmp/com.apple.launchd.JtnuxMqJaG/org.macosforge.xquartz:0', 'LC_TERMINAL': 'iTerm2', 'COLORTERM': 'truecolor', 'COMMAND_MODE': 'unix2003', 'TERM': 'xterm-256color', 'HOME': '/Users/samowens', 'TMPDIR': '/var/folders/pm/1_m4gjw91wjdbxdxhs21v00c0000gn/T/', 'USER': 'samowens', 'XPC_SERVICE_NAME': '0', 'LOGNAME': 'samowens', 'LaunchInstanceID': '3DEDEAF1-B53E-4F3C-A433-3F6C37538CEA', '__CF_USER_TEXT_ENCODING': '0x0:0:0', 'ITERM_SESSION_ID': 'w0t0p0:61DE275F-5AB7-439D-B3AC-562ACCA2E1EB', 'SHLVL': '1', 'OLDPWD': '/Users/samowens/u-dales-testing/u-dales/build/test', 'ZSH': '/Users/samowens/.oh-my-zsh', 'PAGER': 'less', 'LESS': '-R', 'LSCOLORS': 'Gxfxcxdxbxegedabagacad', 'CONDA_EXE': '/usr/local/anaconda3/bin/conda', '_CE_M': '', '_CE_CONDA': '', 'CONDA_PYTHON_EXE': '/usr/local/anaconda3/bin/python', 'CONDA_SHLVL': '2', 'CONDA_PREFIX': '/usr/local/anaconda3/envs/udales', 'CONDA_DEFAULT_ENV': 'udales', 'CONDA_PROMPT_MODIFIER': '(udales) ', 'FC': 'gfortran-9', 'CONDA_PREFIX_1': '/usr/local/anaconda3', '_': '/usr/local/anaconda3/envs/udales/bin/python3'})
dmey commented 3 years ago

It seems fine, can you please trying removing the build folder please?

samoliverowens commented 3 years ago

Ahh the nuclear option, should've just gone with that initially, it works. I can now reproduce the error, but it's no clearer. Is there any way I can look at what the executable is printing? It used to be in e.g. output.502

dmey commented 3 years ago

Haha great -- And now that it has created and copied the files over, it's probably just easier to go into tests/outputs/502 and run it from there.

samoliverowens commented 3 years ago

It actually runs fine! Not sure what is going on...

dmey commented 3 years ago

Not sure, did you run mpiexec -np 2 path/to/udales namoptions.502. BTW the error log with stdout is at https://github.com/uDALES/u-dales/runs/1690350494?check_suite_focus=true#step:6:20398

samoliverowens commented 3 years ago

I actually did (from tests/outputs/502/master): mpiexec -np 2 ../../../../build/master/u-dales namoptions.502.patch

samoliverowens commented 3 years ago

From the stdout it looks like the master is running ok and the comparison branch is failing (before timestepping begins). On my mac the comparison runs ok too...

dmey commented 3 years ago

I get that it just fails on the first instance of running 502 (https://github.com/uDALES/u-dales/runs/1694070412?check_suite_focus=true#step:6:15735)-- there is no difference in the code between master and the ci-fix branch. I still not sure what is going on though as running it on a single core the release now succeeds (https://github.com/uDALES/u-dales/actions/runs/482428802)... Let's leave this issue open and I will look into this a bit more sometime in the future as it is not anything major -- it's not worth spending time on this at the moment given our other commitments.

EDIT: this seems like a more difficult bug to detect given that sometime CI tests do pass e.g. https://github.com/uDALES/u-dales/actions/runs/490568709 (ignore commit message -- 501 and 502 ran for all, see logs)