wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.18k stars 658 forks source link

GCC 14 leads to compile errors in WRF/WPS compilation #2047

Open SettRaziel opened 1 month ago

SettRaziel commented 1 month ago

Hej there. I am maintaining a project to provide wrf/wps binaries for deployment in an ArchLinux environment.

Describe the bug With the update to gcc 14 several new changes regarding compile flags were done. These lead to a more restrict error handling since several flags that would only lead to warnings now lead to compile errors:

Looking on the changes with gcc 14 it seems they have change the flag: GCC 14

Implicit function declarations (-Werror=implicit-function-declaration) It is no longer possible to call a function that has not been declared. In general, the solution is to include a header file with an appropriate function prototype. Note that GCC will perform further type checks based on the function prototype, which can reveal further type errors that require additional changes.

For well-known functions declared in standard headers, GCC provides fix-it hints with the appropriate #include directives:

error: implicit declaration of function ‘strlen’ [-Wimplicit-function-declaration]
    5 |   return strlen (s);
      |          ^~~~~~
note: include ‘<string.h>’ or provide a declaration of ‘strlen’
  +++ |+#include <string.h>
    1 |

When compiling the wrf model now these new restriction lead to these compile errors, e.g.

gcc  -I. -w -O3 -c   -DDM_PARALLEL -DLANDREAD_STUB=1 -DMAX_HISTORY=25 -DNMM_CORE=0   -c get_region_center.c
get_region_center.c: In function ‘get_region_center_’:
get_region_center.c:40:3: error: implicit declaration of function ‘memcpy’ [-Wimplicit-function-declaration]
   40 |   memcpy(MemoryOrder,MemoryOrderIn,strlen1);
      |   ^~~~~~

To Reproduce Steps to reproduce the behavior: Testing environment is an ArchLinux VM with Linux Kernel 6.8.9. Compiler is gcc/gfortran with version 14.1.1. Current compile flags: Env. Variables Additional parameter for compilation: 35 gfortran dm+sm Precondition: Successful compilation of netcdf, netcdf-fortran, mpich, hdf5 Running my compile routines lead to a reproducible number of around 140 of these implicit compile errors.

Workaround Adding to the compile flags: -Wimplicit-function-declaration (issue tracked in: wrf_archlinux)

Expected behavior No implicit functions declaration throughout the code resolving the workaround to address these issues as warnings and not as errors with the flag which might lead to other side effects during the code compilation.

Additional context Up to gcc/gfortran 13.2.2 these implicit declarations only lead to warnings, so up to that point the compilation runs successfully. Are there plans to refactor the code base?

weiwangncar commented 1 month ago

@SettRaziel Which version of the WRF code have you tried?

SettRaziel commented 1 month ago

I did a recompile of 4.5.0. Just saw this morning, that 4.6 is released, but did not try that one yet. Edit: I started 4.6 in my testing environment. I will update this, when the job is finished. I cannot confirm if 4.6.0 compiles. The new and inconsistent versioning scheme (WRF 4.6.0, noahmp 4.6, ...) breaks most of my script logic. I need more time to get the versioning scheme fixed.

SettRaziel commented 1 month ago

I do stand partially corrected. The errors regarding -Wimplicit-function-declaration are gone with the changes provided in #1823. So i am sorry on that part. But the compile process for 4.6 does run in other errors. These are also related to GCC 14 changes (in my opinion):

Type checking on pointer types (-Werror=incompatible-pointer-types) GCC no longer allows implicitly casting all pointer types to all other pointer types. This behavior is now restricted to the void * type and its qualified variations.

To fix compilation errors resulting from that, you can add the appropriate casts, and maybe consider using void in more places (particularly for old programs that predate the introduction of void into the C language).

Programs that do not carefully track pointer types are likely to contain aliasing violations, so consider building with -fno-strict-aliasing. (Whether casts are written manually or performed by GCC automatically does not make a difference in terms of strict aliasing violations.)

A frequent source of incompatible function pointer types involves callback functions that have more specific argument types (or less specific return types) than the function pointer they are assigned to. For example, old code which attempts to sort an array of strings might look like this:


#include <stddef.h>
#include <stdlib.h>
#include <string.h>

int compare (char a, char b) { return strcmp (a, b); }

void sort (char *array, size_t length) { qsort (array, length, sizeof (array), compare); }

I now get around 80 errors like this:
c_code.c: In Funktion »rsl_litepack«: c_code.c:487:27: Fehler: Übergabe des Arguments 1 von »f_packlint« von inkompatiblem Zeigertyp [-Wincompatible-pointer-types] 487 F_PACK_LINT ( buf, p+yp_curs, imemord, &js, &je, &ks, &ke, &is, &ie, ^~~
char *

In Datei, eingebunden von c_code.c:31: rsl_lite.h:200:25: Anmerkung: »long int « erwartet, aber Argument hat Typ »char « 200 | void F_PACK_LINT (long inbuf, long outbuf, int memorder, int js, int je, int ks, int ke, int is, int ie, int jms, int jme, int kms, int kme, int ims, int ime, int curs); | ~~^~~~~


The error is an incompatible pointer type, here long int* instead of char*. Sorry had my compiler language changed to german. If you need a complete log i would change the language, restart it and add the log in english.
I will also try to add the flag temporary to check if this change is the only source of this error.
weiwangncar commented 1 month ago

@SettRaziel Thanks for testing the latest version of the model. We will take a look at this issue very soon.

weiwangncar commented 1 month ago

@islas When you get a chance, can you review this report?