payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
19 stars 26 forks source link

Driver for configurations using CMEPS with the CESM driver #333

Closed dougiesquire closed 1 year ago

dougiesquire commented 1 year ago

I'm not exactly sure what to call this driver since in theory it could be used/extended to run any model configurations that use CMEPS with the CESM driver, though the primary use case at the moment is for an ACCESS-OM3 configuration (currently it's called Cesm, which is probably not ideal).

In trying to make this general to all CMEPS configs, I've added another optional field to the input config.yml called components where users specify which model components are included in the configuration being run (e.g. see https://github.com/dougiesquire/gmom_jra_wd/blob/main/config.yaml). This is maybe not a desired change, in which case one option would be to make the driver specific to OM3. This would also solve the naming ambiguity, but it would make the driver less general (e.g. it could no longer be used to run https://github.com/dougiesquire/d_jra_wd)

Interested to hear people's thoughts

micaeljtoliveira commented 1 year ago

What about later on adding a ACCESS-OM3 driver that would be a child class of the Cesm model?

coveralls commented 1 year ago

Coverage Status

Coverage: 41.267% (-1.0%) from 42.245% when pulling e1bbe27195a3a1818748acb1b820c42ae6af4c9c on dougiesquire:cesm_cmeps into 0109f2f4137fb34321504dd30950e86dd4876aee on payu-org:master.

aidanheerdegen commented 1 year ago

I'm not exactly sure what to call this driver since in theory it could be used/extended to run any model configurations that use CMEPS with the CESM driver, though the primary use case at the moment is for an ACCESS-OM3 configuration (currently it's called Cesm, which is probably not ideal)

Would it be more accurate to call it Cmeps? Or would it also work with CESM?

I've added another optional field to the input config.yml called components where users specify which model components are included in the configuration being run

This is interesting. Previously coupled models used a sub-model approach, e.g. ACCESS-OM2

https://github.com/payu-org/payu/blob/master/payu/models/accessom2.py

but I guess the CMEPS architecture effectively removes the idea of separate models? Certainly makes writing a driver a fair bit simpler.

AFAICT none of the components utilise the existing drivers of the individual models, is that correct? I can't see how collate is going to work for the FMS based models if this is the case. Or have I missed something?

dougiesquire commented 1 year ago

Would it be more accurate to call it Cmeps? Or would it also work with CESM?

Cmeps_cesm is maybe the most accurate, since it expects the CESM driver. Though, tbh, I'm not actually sure how different things would be if a different driver were used. Maybe I should look into this.

This is interesting. Previously coupled models used a sub-model approach, e.g. ACCESS-OM2

https://github.com/payu-org/payu/blob/master/payu/models/accessom2.py

but I guess the CMEPS architecture effectively removes the idea of separate models? Certainly makes writing a driver a fair bit simpler.

Yeah, the components aren't really sub-models since they don't have their own executables. I'm sure there's a way around having components in the config.yml (which is confusing). I'll try to get back to this this week.

AFAICT none of the components utilise the existing drivers of the individual models, is that correct? I can't see how collate is going to work for the FMS based models if this is the case. Or have I missed something?

That's correct. Everything is handled by the CMEPS driver and the output is quite specific to the driver (e.g. restarts from each component all get named consistently and output to the run directory). Re collation... I'm not sure... For the tests runs I've done, MOM6 output is collated does not need collation.

aidanheerdegen commented 1 year ago

Cmeps_cesm is maybe the most accurate, since it expects the CESM driver. Though, tbh, I'm not actually sure how different things would be if a different driver were used. Maybe I should look into this.

I'm confused. Isn't this the CESM driver? Or does driver in this context mean the model code itself?

Yeah, the components aren't really sub-models since they don't have their own executables. I'm sure there's a way around having components in the config.yml (which is confusing). I'll try to get back to this this week.

I don't dislike the design, just trying to get a better understanding of the design process/limitations.

Everything is handled by the CMEPS driver and the output is quite specific to the driver (e.g. restarts from each component all get named consistently and output to the run directory).

Into their own "namespace"?

Re collation... I'm not sure... For the tests runs I've done, MOM6 output is collated.

In the TWG meeting it was noted that the io_layout = 1,1 was potentially a limiting factor to scalability, so you're definitely going to want to be able to collate outputs, which is already something that is handled by the fms driver. I don't think you want to reimplement that. A couple of options spring to mind:

  1. Split the collating code out to some sort of 'tools` module and import it in both drivers
  2. Do some sort of fancy-dancy python multiple inheritance which I have zero knowledge of to grab the collate functionality from the fms driver (not even sure that is possible, pure speculation)

Maybe you or @marshallward or @angus-g have an opinion or some knowledge about the best way to implement that.

dougiesquire commented 1 year ago

I'm confused. Isn't this the CESM driver? Or does driver in this context mean the model code itself?

Yeah sorry, I was a bit fast and loose with my language in my previous comment. There're two "drivers" in these discussions:

The Payu driver in this PR is intended to run model configurations that use NUOPC/CMEPS along with the CESM (NUOPC/CMEPS) driver. What I meant to say was that I'm not sure whether it would work (or could be easily extended to work) with other NUOPC/CMEPS drivers.

Hmmm... reading that back, I'm not sure it's any clearer

Re your other comments/questions, I'll try to get my head back into this later this week.

micaeljtoliveira commented 1 year ago

Cmeps_cesm is maybe the most accurate, since it expects the CESM driver. Though, tbh, I'm not actually sure how different things would be if a different driver were used. Maybe I should look into this.

From what I could see, different drivers have different ways to specify which components to use at runtime. So some things would definitely need to be different for a different driver.

aekiss commented 1 year ago

There are some rough edges that should be fixed - e.g. payu sweep doesn't work reliably (sometimes removes link but not dir that was linked to)

dougiesquire commented 1 year ago

Thanks for reporting this @aekiss. I need to find some time to implement this properly

dougiesquire commented 1 year ago

Finally coming back to this. Before spending too much time on it, I want to check in with @aekiss and @micaeljtoliveira on ACCESS-OM3 development.

Is it still looking like the configuration set-up (config/input files structure etc) and output from ACCESS-OM3 will be the same/similar as CESM-CMEPS? I.e., is it looking like the Payu driver in this PR could end up being what is used by everyone to run ACCESS-OM3, or is it likely just to be used during development of ACCESS-OM3 (for comparing ACCESS-OM3 executables to CIME-built executables)?

If the latter, we probably want to keep this Payu driver out of main and we should start writing a dedicated ACCESS-OM3 driver.

aekiss commented 1 year ago

Thanks for looking at this. Good question - I think it will definitely be needed for development of ACCESS-OM3 for comparing to CIME/CESM, but it's too early to say what config the final production version will use. The inputs will all be different, but maybe the directory and file structure can be retained.

But if we call it the cesm driver then we should be able to put it into main and then adapt or duplicate it for an accessom3 driver as needed, right?

dougiesquire commented 1 year ago

But if we call it the cesm driver then we should be able to put it into main and then adapt or duplicate it for an accessom3 driver as needed, right?

Yes... I think. But I think the cesm driver possibly needs a bit of rethinking.

dougiesquire commented 1 year ago

In the TWG meeting it was noted that the io_layout = 1,1 was potentially a limiting factor to scalability, so you're definitely going to want to be able to collate outputs, which is already something that is handled by the fms driver. I don't think you want to reimplement that. A couple of options spring to mind:

  1. Split the collating code out to some sort of 'tools` module and import it in both drivers
  2. Do some sort of fancy-dancy python multiple inheritance which I have zero knowledge of to grab the collate functionality from the fms driver (not even sure that is possible, pure speculation)

Possibly an FmsCollate mixin?

aidanheerdegen commented 1 year ago

But I think the cesm driver possibly needs a bit of rethinking.

If it works I'd merge and work on improvements with follow up PRs, unless we're talking a radical rethink.

Possibly an FmsCollate mixin?

Yeah that is what I was thinking of with "fancy-fancy python multiple inheritance".

dougiesquire commented 1 year ago

If it works I'd merge and work on improvements with follow up PRs, unless we're talking a radical rethink.

What I'm thinking is that we might want to have a different approach than specifying the CESM model components in the config.yaml. This only makes sense for CESM-CMEPS models and it's maybe a bit confusing to users how components is different than submodels.

As suggested by @micaeljtoliveira, I'm thinking a CesmCmepsBase class that specific configurations inherit from (e.g. AccessOm3).

dougiesquire commented 1 year ago

There are some rough edges that should be fixed - e.g. payu sweep doesn't work reliably (sometimes removes link but not dir that was linked to)

@aekiss, I've been unable to reproduce this. Can you provide any more details about what happened?

aekiss commented 1 year ago

Just ignore my comment - it was very sporadic. If it happens enough to be an problem I'll make an issue for it.

dougiesquire commented 1 year ago

@aidanheerdegen are you the right person to ping for a review?

There is now a AccessOm3 driver that inherits from a new CesmCmepsBase. The AccessOm3 driver will collate mom output. This can be used to run the config at:

(after a couple of small tweaks to the config that I haven't pushed yet). 

It would also be trivial to create drivers from CesmCmepsBase to run the following configs:

but I personally don’t think this is worthwhile. It will just clutter the payu.models module with drivers that probably no one will ever use. Instead, I’d suggest we archive those config repos and make it clear in their READMEs that those CESM-CMEPS configs are not supported in the main branch of payu. We can always add the drivers later if we decide we want them.

Once this PR is merged, I’ll update the configs above.

aekiss commented 1 year ago

Awesome, thanks @dougiesquire, I like this inheritance approach.

We should retain these configs, as these combinations will actually be used a lot, and so we will need payu support for them as part of the staged development plan https://github.com/COSIMA/MOM6-CICE6 https://github.com/COSIMA/CICE6-WW3

dougiesquire commented 1 year ago

We should retain these configs, as these combinations will actually be used a lot, and so we will need payu support for them as part of the staged development plan https://github.com/COSIMA/MOM6-CICE6 https://github.com/COSIMA/CICE6-WW3

Sure. Any thoughts on what to call the payu models for these (ie what's entered in the config.yaml)? Perhaps something like "cesm-mom6-cice6" and "cesm-cice6-ww3"? If we go this route then perhaps we should rename "access-om3" to "cesm-mom6-cice6-ww3" for consistency (at least until things are more bedded down), although that's a bit of a mouthful...

aekiss commented 1 year ago

How about just "mom6-cice6", "cice6-ww3" and "mom6-cice6-ww3"? Or do we expect we will need different drivers for the access-om2 configs, in which case the cesm prefix would help differentiate them?

dougiesquire commented 1 year ago

Or do we expect we will need different drivers for the access-om2 configs, in which case the cesm prefix would help differentiate them?

I don't understand sorry. There's already an AccessOm2 driver (model: access-om2) right? Are we expecting to need more access-om2 drivers?

micaeljtoliveira commented 1 year ago

About the naming scheme, I would actually propose something different. I would call all of those models ACCESS-OM3, even thought that's incorrect if one considers that ACCESS-OM3 is always MOM6+CICE6+WW3. The point here is that we would like to have a compact and flexible way to swap some of the components by a data model in the payu config. So I would still allow one to specify the submodels in the payu config, but only for the ocean, waves and sea-ice components. All the rest (atm, etc) should be fixed. This last point is what would distinguish the access-om3 model in payu from the cesm model: some of the components would be hard-coded.

Does this make sense?

aekiss commented 1 year ago

Apologies @dougiesquire - I meant ACCESS-OM3, not 2

dougiesquire commented 1 year ago

Does this make sense?

It does, but now we're back to the original problem of how to list the ACCESS-OM3 submodels in the payu config. We can't list them under submodels as this is for when each submodel has their own executable. I originally added a components key, but I went off this approach since it means that the config set-up is different than for all the other payu models. I'm quite possibly overthinking this...

A components key would mean the payu configs look like this, for example:

...

model: access-om3
components:
  - mom6
  - cice6
  - ww3

...
micaeljtoliveira commented 1 year ago

Couldn't we get the components list from the nuopc.runconfig file instead of getting it from the config file?

aekiss commented 1 year ago

All the rest (atm, etc) should be fixed.

Actually I expect it will be common for users to want to modify these components, e.g. perturbing the forcing or runoff, or replacing it with a different product (eg ERA5) so some flexibility should be retained here too. These would always be data models but the data sources should be easy to modify.

aidanheerdegen commented 1 year ago

Yes, but have been flat out! Sorry. Will look tomorrow.

aidanheerdegen commented 1 year ago

We can't list them under submodels as this is for when each submodel has their own executable.

So the mapping of CPUs between different sub-domains is done internally?

aekiss commented 1 year ago

I think so...?

dougiesquire commented 1 year ago

So the mapping of CPUs between different sub-domains is done internally?

Yup, this is done in the nuopc.runconfig file. At the moment, for all the configs I set up, each component just runs sequentially and is allocated 48 cores.