ufs-community / ufs-weather-model

UFS Weather Model
Other
142 stars 248 forks source link

WW3 cannot build pre-and-post processing jobs #2502

Open JessicaMeixner-NOAA opened 3 days ago

JessicaMeixner-NOAA commented 3 days ago

Description

In PR https://github.com/ufs-community/ufs-weather-model/pull/2445 WW3 was updated with PR https://github.com/NOAA-EMC/WW3/pull/1303 which makes references to modules that are only in the cap code. This means that we cannot run the end-to-end system (for example see: https://github.com/NOAA-EMC/global-workflow/pull/3106#issuecomment-2484169044).

We will need ot make an update to WW3 so that we can compile the pre-and post-scripts within the dev/ufs-weather-model.

To Reproduce:

The easiest way to reproduce this is to clone the g-w, update the model to the hash in develop and then build all or just the ww3 pre/post:

git clone https://github.com/noaA-EMC/global-workflow
cd global-workflow/
git submodule update --init --recursive 
cd sorc/ufs_model.fd/
git checkout develop
git submodule update --init --recursive 
cd ../
./build_ww3prepost.sh 

Additional context

@sbanihash had noticed this while reviewing a PR FYI @jswhit who ran into this in g-w.

@sbanihash and myself will be meeting with @MatthewMasarik-NOAA first thing tomorrow morning to find a solution.

Output

JessicaMeixner-NOAA commented 2 days ago

I'm working on some updates here: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs which have allowed me to successfully build the pre & post scripts in the global-workflow. More testing is necessary but sharing current progress.

DeniseWorthen commented 2 days ago

I'm really confused by this. The WW3 build in UFS has always referenced files which were only in the dev/ufs-weather-model branch. The previous (prior to PIO commit) src file list for example contained

.....
  w3tidemd.F90
  wav_grdout.F90
  w3iogoncdmd.F90
  wav_shr_flags.F90
  )

set(nuopc_mesh_cap_src
  wav_kind_mod.F90
  wav_shr_mod.F90
  wav_shel_inp.F90
  wav_comp_nuopc.F90
  wav_import_export.F90
  wav_wrapper_mod.F90
  )
....

Why was this working prior to PIO and now is not?

JessicaMeixner-NOAA commented 2 days ago

The files in nuopc_mesh_cap_src were previously not referenced in the ftn_src files. There are now files in ftn_src trying to call files that are in the nuopc_mesh_cap_src. Moreover, the standalone WW3 builds do not seem to be finding PIO in the build either.

With the additions I added in https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs we can build the pre/post WW3 scripts, however we cannot build with MPI, which is where we start to get into additional issues including not being able to find the PIO variables, etc.

DeniseWorthen commented 2 days ago

These mesh-cap branch files

  wav_grdout.F90
  w3iogoncdmd.F90
  wav_shr_flags.F90

have always been in the ftn_src file list. See src_list.cmake at c7004b6.

EDIT: I suspect what you might need to do is just move some of the new files to the ftn_src list.

JessicaMeixner-NOAA commented 2 days ago

@DeniseWorthen - Moving things into the ftn_src list would solve some of the issues, yes. We will still need W3_MPI if defs around things that are parallel/use MPI so that when we build without MPI (for pre and post processing steps) that can still be built. I haven't quite figured out how to deal with the PIO parts - and the fact that we aren't finding PIO in WW3. I'm not sure if that's unrelated and just showing up because other errors have not yet been fixed, cmake updates are needed, or if we also need something in WW3 to say yes/no for PIO.

DeniseWorthen commented 2 days ago

I'm not the best source for CMake build know-how. But when I first started working on the PIO feature, I played around w/ adding PIO to the model/src/CMakeLists.txt, thinking I might need it. It turned out I didn't

  target_sources(ww3_lib PRIVATE ${cap_src})
  target_link_libraries(ww3_lib PUBLIC esmf
                                       PIO::PIO_Fortran)
  # Don't build executables when building WW3 ESMF library
JessicaMeixner-NOAA commented 2 days ago

Thank you @DeniseWorthen ! I didn't find any CMake additions in ufs-weather-model for PIO either, so maybe we just don't need it? Eventually when this goes back into develop, we'll have to figure out how to get around potentially putting more barriers aroudn PIO, but we don't have to worry about that for today.

I added a lot more files to the ftn_src and got the MPI build going, but now the standalone is breaking because those files have places we need the ifdefs for MPI. I'll want to run some additional standalone regression testing and a quick g-w run, but we should have a fix that shouldn't change any answers or anything like that tomorrow or Thursday. I keep adding my updates to: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs if anyone is curious.

JessicaMeixner-NOAA commented 1 day ago

Okay - I ended up going a slightly different direction today (https://github.com/JessicaMeixner-NOAA/WW3/tree/bug/addPIOswitch) and found a minor bug that we needed to move and end statement inside of a W3_MPI ifdef, but I think the new way, which adds a W3_PIO if defs, will help us in the future and is a lot cleaner. It's not perfect in the sense that I cannot build standalone ww3 w/PIO - yet - because of EMSF dependencies in routines, but it opens up the door for that in the future. I can build the pre-and post-processed standalone routines with and without MPI.

That being said, I think we either need to change some of the src lists or maybe add a check that the nuopc cap has to use the PIO? (which I think is desirable, and I don't see anyone using NUOPC that would object).

Still to do: more standalone ww3 runs, ufs-rt tests, and running in g-w. I'm hoping I can do all of these things by Friday and have a PR ready by no later than Monday is my goal, if not before.

@DeniseWorthen - let me know if you would like to see other tests, have objections to this approach, or would like to see other additions.

DeniseWorthen commented 1 day ago

This change will breaks the capability of turning the netcdf feature on/off at run time vs compile time. That's a big step backward, imo.

I understand the intent is simply to solve the immediate problem of compiling the codes that are needed for pre/post in G-W. I think there are better ways of solving this, but those would take time to sort out.

My concern is that it will persist (as things in WW3 have a tendency to do). That leads to all sorts of messy problems, eg broken use-cases, like trying to set init-from-binary true and not have either restartnc or PIO set. There's basically no utility to having any code inside a use_restartnc or use_historync flag if you're not using PIO.

The PIO feature in the develop branch will require creating new nml namelists so that PIO can be configured that way (CICE has this implemented). I don't suspect that is happening any time soon.

JessicaMeixner-NOAA commented 18 hours ago

Only having this based on a compile time flag means that even if you are not using PIO you have to have the library built correctly, yes? My concern with this is that requires the many users around the world outside of NOAA to at minimum build PIO, which is possibly non-trivial and not possible for them. WW3 is used by many small meteorology organizations where even building netcdf in the past was difficult and we need to discuss adding PIO with the llarger WW3 community which has not been done. I will happily bring this to that larger community and see if this is okay, but generally this is the way WW3 handles adding libraries that are not required for all use-cases. No it's not perfect, and I understand how you see this as a major step backwards. I just don't know another solution at the moment.

I agree it might be a while until a feature is ready to be used from PIO in the develop branch, but my hope is this is a small step towards enabling dev/ufs-weather-model to get merged into the develop branch without this feature fully ready. And solving the immediate issue of being able to run this in the end-to-end system.

We do need to get some sort of solution in sooner than later, I've mostly stopped all of my other work to address this, and will continue to so if you have a better solution - please let me know and I'll start implementing that immediately. At this point I am going to start testing this to make sure this is truly a viable solution.

DeniseWorthen commented 18 hours ago

In terms of dev/ufs-weather-model going into develop in the short term, I think that is putting the cart before the horse. Unless there is a way (ie, outside of ESMF config) to use the feature, then what is the point? Get that done, then I can see the utility. And of course dev/uwm is a year behind develop---so how is that going to work?

There is also utility in UWM pointing to it's own branch. If UWM uses develop, then the merging of PRs to WW3 develop means that users 'around the world' are stuck in the UWM commit Q. WW3 cannot merge PRs to develop unless UWM also points to develop. Given how slow the UWM Q works.....

In any case, my essential point still holds. Anything w/in logical flags use_restartnc and use_historync is broken if PIO is not available, so carving out just sections around the specific calls to write_history or write_restart isn't sufficient imho. And I think hiding PIO for non-users could be done in other ways, short just introducing another set if ifdefs into the main WW3. But that solution would take a little more time to implement.

JessicaMeixner-NOAA commented 17 hours ago

@DeniseWorthen if you can explain the other way to hide PIO please let me know and I'll start to implement it.

DeniseWorthen commented 17 hours ago

I don't have a solution at this instant. I'm saying I think it could be done.

I'm also getting confused because it seems you're doing some things in some branches by adding an MPI switch, and other things in other branches adding a PIO switch.

I guess I'm not doing a good job of explaining....you have inside of a use_restartnc logical flag, just a carve out under PIO ifdef for calling read_restart for example.

#ifdef W3_PIO
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
#endif

But that doesn't make sense---nothing inside of the use_restartnc is usable w/o PIO. So why not just

ifdef PIO

if (use_restartnc) then

.......

endif

JessicaMeixner-NOAA commented 17 hours ago

@DeniseWorthen Apologies for the confusion. My first effort was to see if just putting things behind W3_MPI siwtches in this branch: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs however, I realized in the end it wasn't sufficient and I thought a cleaner way (and possibly more future proof), would be to add a new switch W3_MPI instead. So I abandoned that branch, and started a new approach here: https://github.com/JessicaMeixner-NOAA/WW3/tree/bug/addPIOswitch)

I was trying to put it in as small of place of possible while I was debugging an unrelated issue and forgot to circle back. I agree, it makes more sense to put it in the larger area - I'll make that change now.

If you think of a different solution - please let me know and I'm happy to dedicate time to implement it. I'll try to brainstorm other ways too.

JessicaMeixner-NOAA commented 17 hours ago

@DeniseWorthen change in the if-defs is committed here: https://github.com/JessicaMeixner-NOAA/WW3/commit/f5280035d1ece97b1a0318de15210f6d85e339e0

DeniseWorthen commented 16 hours ago

In terms of thinking about 'a better way', exactly which codes needed for pre/post in G-W are in conflict w/ how things are currently implemented?

JessicaMeixner-NOAA commented 15 hours ago

w3init and w3wave are the two codes that caused build errors.

DeniseWorthen commented 14 hours ago

Understood, but which pre/post programs? I've been assuming these are 'ww3_X.F90' standalone programs which are built on top of w3 modules that are causing the issues.

JessicaMeixner-NOAA commented 14 hours ago

https://github.com/NOAA-EMC/global-workflow/blob/develop/sorc/build_ww3prepost.sh#L61-L63

DeniseWorthen commented 14 hours ago

Ah, thanks. Looks very much like the createmoddefs script, which I have used before.