payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
21 stars 27 forks source link

Experiment class restructure #508

Open jo-basevi opened 2 months ago

jo-basevi commented 2 months ago

This came up in a discussion with @truth-quark about the Experiment class in Payu. It is quite large (approx > 1100 lines of code) with some very long functions. It also holds a lot of state which can be confusing to know where fields are used in the project.

Move functions out of the Experiment class

Some low-hanging fruit to reduce the size of the Experiment class is to refactor the code I added last year for restart pruning. That can be moved to it's own file and broken up into smaller more easily testable functions.

A larger change could be to separate Experiment into several files, for example:

Circular(ish) relationship between Experiment and Model classes?

An Experiment has a model/models fields to store the instances of the model driver/s. To create a model driver instance, the instance of the Experiment is passed along to the initialise method of the model driver class. If I add type hints for Experiment to the model drivers classes and add an from payu.experiment import Experiment, I would get circular import errors.

From having a quick look at what the model driver classes need from Experiment are paths (work, control, output, restart, prior restart/output), the configuration file (config.yaml) and manifest objects. Some drivers, such as access, need to iterate through the other submodel model instances (Experiment.models) to copy restart files/calculate end dates, or modify class fields. Some drivers also need Experiment.counter, Experiment.repeat_run, Experiment.runtime (repeat and runtime could just be accessed via config.yaml file).

I had in my head that it would be nice to not build an instance of Experiment always for testing model drivers, but in writing this all out I am realising that might not be straightforward or simple. Maybe I just need to get better at writing mock test objects for Experiment..

An option could be to have another data structure object that is created by Experiment and passed to model classes. This could be a useful reference if logged out to a file when there are errors to show what was the current experiment state.

Using more I/0 functions could be an option rather than storing state for paths, but payu might still need to a way store some information such as the experiment name to avoid recalculating too many times when determining paths.

I'm not really sure the best way to go about this, and just created this issue mainly as a place to put down some general ideas. So feel free to add any thoughts!

truth-quark commented 2 months ago

This summary of "Working Effectively with Legacy Code" might be useful: https://gist.github.com/jonnyjava/42883d4e464167f81e2ee60a488a5ded

The original book is over 500 pages, so potentially too much to digest!

aidanheerdegen commented 2 months ago

Some of this mess is because it's easy to pass the Experiment object around and just pull out what you want. But some of the examples you mention above could be avoided by passing in just what is required.

This is similar to what you suggested:

An option could be to have another data structure object that is created by Experiment and passed to model classes. This could be a useful reference if logged out to a file when there are errors to show what was the current experiment state.

We could pass in the config object, the manifests, the counter and some path related stuff.

With respect to the path related stuff, we do a hell of a lot of "os.path.join()" that I wonder if we couldn't just create methods to return these things, and just alter the value that the methods use, rather than recreate the whole path every time.