Resolve "Convert costs_step.f90 to Python" - [merged] #2384

Closed jonmaddock closed 1 year ago

In GitLab by @pb9430 on Aug 23, 2021, 11:26

_Merges 1385-convert-costsstep-f90-to-python -> develop

Closes #1385

In GitLab by @pb9430 on Aug 25, 2021, 12:11

added 5 commits

22ee65ad...198a550a - 2 commits from branch develop
e9b0dfa4 - Create costs_step.py file
6951b2e9 - Merge branch 'develop' into 1385-convert-costs_step-f90-to-python
f04d12bc - Initialise values in costs_step.f90 from Python

@pb9430 this looks great so far! I'm working to let you access the costs_step object in caller.py, so replace the existing call directly into Fortran.

added 1 commit

192ec375 - Create Models class to contain physics/engineering Python class instances

Compare with previous version

@timothy-nunn This latest commit (192ec375) isn't the best, but it allows the caller to begin calling methods on Python class instances, such as costs_step, rather than straight into the Fortran subroutines. This should mean that the CostsStep class can now be used to begin storing the Fortran variables and "driving" the isolated Fortran subroutines in costs_step.f90. The f2pybike repo might help with this!

I am happy to tidy up/completely change the way I've done this so far.

added 164 commits

192ec375...85359343 - 163 commits from branch develop
fa737e9c - Merge branch 'develop' into 1385-convert-costs_step-f90-to-python

Compare with previous version

added 1 commit

fa6a8991 - Add CostsStep.run() method

Compare with previous version

added 1 commit

352847dc - Convert output_module.f90 to Python

Compare with previous version

added 1 commit

fa70b0f2 - Exclude double var from default dicts checking

Compare with previous version

added 1 commit

2ca55512 - CostsStep instance now used to write output

Compare with previous version

added 1 commit

91a0fb5e - Ensure gamcdfix is initialised

Compare with previous version

added 3 commits

1ebcf6f3 - Wrapping output.f90 requires real(8)s
6fbc97b1 - Convert costs_step() subroutine to Python method
e3ecc7fb - Remove pointless test

Compare with previous version

added 1 commit

2488bd9e - Minimise argument passing of output file vars

Compare with previous version

added 2 commits

71e401a8 - Add CostsStep method for step_a20() call
71e3425b - Move step_a20() output to Python

Compare with previous version

added 1 commit

22e1aa0b - Instantiate CostsStep in fixture before mocking

Compare with previous version

added 1 commit

81cff935 - Add step_a21() method

Compare with previous version

added 1 commit

c2f18b6a - Add step_a22() method

Compare with previous version

added 1 commit

f7dad004 - Correct step_a22xx tests

Compare with previous version

added 2 commits

9f91c818 - Add step_a23() method
605ec5c1 - Add step_a24() method

Compare with previous version

added 1 commit

eb360001 - Add step_a25() method

Compare with previous version

added 2 commits

8af43304 - Add step_a27() method
1f716754 - Add step_indirect_costs() method

Compare with previous version

added 1 commit

5ab96383 - Add coelc_step() method

Compare with previous version

added 1 commit

2de03ea3 - Convert step_a22() to Python

Compare with previous version

@timothy-nunn this is my WIP (but best so far!) conversion workflow, which I've updated in light of actually starting the conversion on costs_step.f90. It's based on the workflow in my toy f2pybike repo. I've dumped it here for now, but perhaps it should go somewhere (the docs, on a branch for now) where we can both edit and comment more easily. I'd be really grateful for your input on this one. Let me know what you think!

Python conversion workflow

Aim and Motivation

The aim of this Python conversion work is to create a Python package that interfaces with isolated Fortran subroutines. This enables a host of benefits offered by Python over Fortran (easier debugging, data analysis tools, testing, file I/O, interfacing with other codes etc.), and is vital for the maintainability and extensibility of the code. It will also offer gains in modeller productivity and greater confidence in results.

In order to achieve this, the existing data structure (Fortran module variables) will need to be moved into Python (object attributes). Due to the highly interdependent nature of Process (many subroutines being reliant on variables ouside their arguments list, such as internal or external module variables), care needs to be taken to ensure success and avoid dependency "unravelling".

1. Create a Python class for the Fortran module

This new class interfaces with the corresponding Fortran module, with methods calling into the Fortran to run the module's subroutines. It is instantiated in the Models class in main.py, and run from the Caller class in caller.py.

(The instance of Models (containing the physics and engineering model instances) needs to accessed not just from the Caller class, depending on how Process is run. They also need to be initialised before the iteration variables are loaded (loadxc()), hence why they are initialised early but don't immediately appear to be run until later.)

In the module to be converted, there is ofter a main "caller" subroutine, which is responsible for calling the other subroutines in the module. If it exists, convert the main "caller" subroutine into Python. This subroutine serves just to call other subroutines, and we want each subroutine to be called directly from the Python. This allows for no use dependencies, better exception handling, logging, testing etc.

Result

At the end of this stage, all calling of the Fortran module's subroutines should be through Python class methods. This commonly consists of an __init__ method to initialise the Fortran module variables and a single "caller" method that calls the various Fortran subroutines.

2. Remove `use` statements from the Fortran module's subroutines

Convert the use statements in individual subroutines to subroutine arguments, explicitly setting the intent for each. This makes the subroutine isolated from other modules; its result only depends on its arguments from Python, not use dependencies. If the subroutine uses a variable from its own module, this should also be converted to an argument for complete isolation.

The Python class method that handles the subroutine call should get the required input arguments from Fortran modules and pass them to the subroutine. The output variables are then returned from the subroutine to the Python, which then sets the required Fortran module variables.

It is important that the results are still stored in the Fortran modules at this point, so that other Fortran modules that still use the variables can access them and that they are updated as subroutines are run. This prevents an "unravelling dependency" effect (chains of uses all needing converting simultaneously) and allows the conversion to be done iteratively, module by module.

Nested subroutine calls

If a subroutine calls another subroutine, then its use statement and call can be left in (subroutine arguments can't be passed in from Python anyway). If required, the subroutine can usually be refactored so that each subroutine is called directly from Python, but this isn't usually necessary.

The arguments to the second-level subroutine (both intent(in) and intent(out)) will need to be supplied by the top-level subroutine, and hence will also need to be passed from (or assigned in) the Python. This is required in order to remove module variable dependencies in the Fortran module's subroutines by converting them to arguments, which will eventually permit the switch to a Python data structure. Unfortunately this can result in some very long argument lists in the top-level subroutine, but this is unavoidable at this point. Further refactoring to split up such subroutines or call them directly from Python can reduce this in time.

If the subroutine calls are simply writing output, these can be moved into Python at this stage if it's convenient, as this will help with getting all file IO into Python in future.

Deciding on the depth of interface

The depth of the interface (the extent to which Fortran subroutines are called directly from Python) should be decided based on the usefulness of having Python at a given level. For instance, just having a Python call to the top-level "caller" subroutine in a module is not particularly useful for testing, debugging, exception handling etc., but provides a starting point. On the other hand, calling every Fortran subroutine directly from Python will require substantial Python-Fortran refactoring and will offer little benefit.

The compromise is to convert the "caller" subroutine into a Python method which calls individual Python methods to call the top-level subroutines. Some modules may require a greater or lesser depth of interface, but this is probably the most frequent case and requires a minimal amount of refactoring.

Result

At this point, Python class methods get Fortran module variables, pass them to isolated subroutines in the Fortran module and then set other Fortran module variables with the result. All Fortran module variable getting and setting for the subroutine in question is done via the Python class for the Fortran module. All "state" is still stored in the Fortran module variables, but the Python is responsible for getting and setting it. All use statements in the Fortran are for subroutine names only.

3. Remove all `use` variable statements from Fortran

Repeat steps 1 and 2 for each module until there are no use statements left in the Fortran for variables (useing subroutines is acceptable). Each module can be converted independently and can be merged once steps 1 and 2 are complete for it, allowing the work to be iterative.

4. Remove all `<module_name>_variables.f90` modules

Once all of the use statements are removed for variables, then Python is getting and setting all Fortran module variables. The Fortran module variables can therefore be moved to Python object attributes, and the <module_name>_variables.f90 modules can be removed.

added 1 commit

76915ac1 - Convert use dependencies to args for step_a2201()

Compare with previous version

added 1 commit

450e6ba2 - Remove step_a220101() use dependencies

Compare with previous version

added 2 commits

835ae874 - Remove step_a220101() use dependencies
030a5479 - Remove step_a22010302() use dependencies

Compare with previous version

added 2 commits

729b7dc0 - Remove step_a2202() use dependencies
845907d5 - Remove step_a2203() use dependencies

Compare with previous version

added 1 commit

18e4b258 - Remove step_a2204() use dependencies

Compare with previous version

added 3 commits

b6ed64c6 - Remove step_a2205() use dependencies
c05e6805 - Isolate subroutines from module var dependencies
45f10dfd - Remove step_a2206() use dependencies

Compare with previous version

added 1 commit

ef00d046 - Remove step_a2207() use dependencies

Compare with previous version

added 43 commits

ef00d046...845981f6 - 39 commits from branch develop
b803dd32 - Merge branch 'develop' into 1385-convert-costs_step-f90-to-python
46d42c8a - Fix mistakes from Fortran conversion
e5a99ad2 - Add missing argument
58dd2ad3 - Update step_a2201() unit test result

Compare with previous version

added 2 commits

9b5c4904 - Remove step_a220102() use dependencies
59272092 - Rename costs_step imported modules

Compare with previous version

@timothy-nunn I believe steps 1 and 2 are now complete for the costs_step.f90 module, and this is ready for review. I'd appreciate your thoughts on the conversion "workflow" and how I've implemented it here; is there anything I haven't thought about?

I'm aware that in some cases it would be easier to convert certain subroutines completely into Python, but I'm keen to avoid calculations being split across the two languages, particularly in the first module to be converted. Hence I've tried to do this one by following the workflow quite rigidly.

In GitLab by @timothy-nunn on Oct 27, 2021, 08:44

Commented on process/costs_step.py line 89

I'm guessing this is how it was in the original Fortran, but why do we calculate the initial cv.cdirt then add again to it immediately in another step couldn't we move the addition of cs.step27 into this line? (Of course moving step_a27 under a25.)

In GitLab by @timothy-nunn on Oct 27, 2021, 09:06

Commented on process/costs_step.py line 430

Some of these variables are never used.

We could replace such variables with _ although i do not know if that would negatively impact readability so is better to leave it the "non-pythonic" way

In GitLab by @timothy-nunn on Oct 27, 2021, 09:07

Commented on process/costs_step.py line 477

This is not really consistent with other IO above. I think it might be more consistent and readable to put all this output under an if statement

In GitLab by @timothy-nunn on Oct 27, 2021, 09:11

Commented on process/costs_step.py line 995

Again another unused variable that might be more pythonic as _ if we choose to go that route.

In GitLab by @timothy-nunn on Oct 27, 2021, 09:17

Commented on process/evaluators.py line 94

Is this a relic of Fortran, can we remove this now and just use ifail_in? or at least ifail_out = ifail_in

In GitLab by @timothy-nunn on Oct 27, 2021, 09:17

Commented on process/evaluators.py line 190

Same as above

In GitLab by @timothy-nunn on Oct 27, 2021, 09:18

Commented on process/final.py line 4

Consistency: ... import output as ot or whatever it is imported as in other files

In GitLab by @timothy-nunn on Oct 27, 2021, 09:19

Commented on process/main.py line 51

Consistency: again, we probably want to import as ... to keep consistent with other files

In GitLab by @timothy-nunn on Oct 27, 2021, 09:24

Commented on source/fortran/costs_step.f90 line 30

This strikes me as something that could be Pythonised since its only called from costs_step.py and test_costs_step.py

In GitLab by @timothy-nunn on Oct 27, 2021, 09:26

Commented on source/fortran/costs_step.f90 line 53

Again, I would be inclined to say that this routine only divides one number. I think this could be appropriately converted to pure Python. I guess we need to decide how strict everyone wants to be and what constitutes "maths".

In GitLab by @timothy-nunn on Oct 27, 2021, 09:28

Commented on source/fortran/costs_step.f90 line 1059

Again, is this really justifiable? More so than the former certainly, but I still do not know if its enough.

In GitLab by @timothy-nunn on Oct 27, 2021, 09:29

Commented on source/fortran/costs_step.f90 line 1659

Strategy decided above will influence this routine too.

In GitLab by @timothy-nunn on Oct 27, 2021, 09:30

Commented on source/fortran/costs_step.f90 line 1080

Strategy above may affect this one too

In GitLab by @timothy-nunn on Oct 27, 2021, 14:53

I'm going to make some notes on this here for us to discuss at a later date:

I agree with your points. I think we should maybe iron out the depth of the interface, as in 2..

We want to keep the maths inside of Fortran, however we want IO and exception handling to be managed inside of Python. This should become our explicit goal: we want to Fortran to be exclusively maths (and other legacy code we darent touch) and to convert the entire IO and error system into Python (as you said). The issue is then, however, that we end up with long argument lists for our function/subroutine calls into the Fortran.

We might want to consider some sort of wrapper system to abstract this horribleness away from the frontend Python (this is just thinking out loud, I do not know how easy or useful this will be in practice),

In GitLab by @timothy-nunn on Oct 27, 2021, 14:54

I think we would also be wise to get other PROCESS users input on this... do they understand the changes made to costs_step? Do they understand what goes into Fortran and what into Python?

I agree with all in your first comment, and I think we've already gone some way to discussing this regarding converting modules completely to Python vs. creating Python-Fortran interfaces, deciding on the approach for each module in turn.

I would be wary of creating another wrapper however; I think a long subroutine argument list (contained in its own Python method) is preferable and much simpler. I would argue that being contained in a method is not dissimilar to being wrapped (simple method call with no/few arguments containing a subroutine call with many arguments). It also serves as a good reminder that the subroutine in question needs splitting up too! Kristian spent some time investigating wrapper options, but they generally ended up being pretty complex and in my opinion adding to the overall complexity.

You're absolutely right about keeping the modellers on board; I would be inclined to get this merged in and once we have a fully-Pythonised module and a Python-Fortran hybrid module, we can present them to the group, explain it and answer any questions. I think that dealing with concrete examples that are already working, rather than hypotheticals, is the way to go.

Because cv.cdirt at that point is used as an input to cs.step_a27() in the step_a27() method. This could be refactored, but I was trying to just stick to the minimum required for the conversion as much as possible.

ukaea / PROCESS