Open jo-basevi opened 1 month ago
This summary of "Working Effectively with Legacy Code" might be useful: https://gist.github.com/jonnyjava/42883d4e464167f81e2ee60a488a5ded
The original book is over 500 pages, so potentially too much to digest!
Some of this mess is because it's easy to pass the Experiment
object around and just pull out what you want. But some of the examples you mention above could be avoided by passing in just what is required.
This is similar to what you suggested:
An option could be to have another data structure object that is created by
Experiment
and passed to model classes. This could be a useful reference if logged out to a file when there are errors to show what was the current experiment state.
We could pass in the config
object, the manifests, the counter and some path related stuff.
With respect to the path related stuff, we do a hell of a lot of "os.path.join()" that I wonder if we couldn't just create methods to return these things, and just alter the value that the methods use, rather than recreate the whole path every time.
This came up in a discussion with @truth-quark about the
Experiment
class in Payu. It is quite large (approx > 1100 lines of code) with some very long functions. It also holds a lot of state which can be confusing to know where fields are used in the project.Move functions out of the
Experiment
classSome low-hanging fruit to reduce the size of the
Experiment
class is to refactor the code I added last year for restart pruning. That can be moved to it's own file and broken up into smaller more easily testable functions.A larger change could be to separate
Experiment
into several files, for example:Circular(ish) relationship between Experiment and Model classes?
An
Experiment
has amodel
/models
fields to store the instances of the model driver/s. To create a model driver instance, the instance of theExperiment
is passed along to the initialise method of the model driver class. If I add type hints forExperiment
to the model drivers classes and add anfrom payu.experiment import Experiment
, I would get circular import errors.From having a quick look at what the model driver classes need from Experiment are paths (work, control, output, restart, prior restart/output), the configuration file (config.yaml) and manifest objects. Some drivers, such as
access
, need to iterate through the other submodel model instances (Experiment.models
) to copy restart files/calculate end dates, or modify class fields. Some drivers also needExperiment.counter
,Experiment.repeat_run
,Experiment.runtime
(repeat and runtime could just be accessed via config.yaml file).I had in my head that it would be nice to not build an instance of
Experiment
always for testing model drivers, but in writing this all out I am realising that might not be straightforward or simple. Maybe I just need to get better at writing mock test objects forExperiment
..An option could be to have another data structure object that is created by
Experiment
and passed to model classes. This could be a useful reference if logged out to a file when there are errors to show what was the current experiment state.Using more I/0 functions could be an option rather than storing state for paths, but payu might still need to a way store some information such as the experiment name to avoid recalculating too many times when determining paths.
I'm not really sure the best way to go about this, and just created this issue mainly as a place to put down some general ideas. So feel free to add any thoughts!