ukaea / PROCESS

PROCESS is a systems code at UKAEA that calculates in a self-consistent manner the parameters of a fusion power plant with a specified performance, ensuring that its operating limits are not violated, and with the option to optimise to a given function of these parameters.
https://ukaea.github.io/PROCESS/
MIT License
36 stars 11 forks source link

Resolve "Convert costs_step.f90 to Python" - [merged] #2384

Closed jonmaddock closed 1 year ago

jonmaddock commented 3 years ago

In GitLab by @pb9430 on Aug 23, 2021, 11:26

_Merges 1385-convert-costsstep-f90-to-python -> develop

Closes #1385

jonmaddock commented 3 years ago

In GitLab by @pb9430 on Aug 25, 2021, 12:11

added 5 commits

Compare with previous version

jonmaddock commented 3 years ago

@pb9430 this looks great so far! I'm working to let you access the costs_step object in caller.py, so replace the existing call directly into Fortran.

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

@timothy-nunn This latest commit (192ec375) isn't the best, but it allows the caller to begin calling methods on Python class instances, such as costs_step, rather than straight into the Fortran subroutines. This should mean that the CostsStep class can now be used to begin storing the Fortran variables and "driving" the isolated Fortran subroutines in costs_step.f90. The f2pybike repo might help with this!

I am happy to tidy up/completely change the way I've done this so far.

jonmaddock commented 3 years ago

added 164 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 3 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

@timothy-nunn this is my WIP (but best so far!) conversion workflow, which I've updated in light of actually starting the conversion on costs_step.f90. It's based on the workflow in my toy f2pybike repo. I've dumped it here for now, but perhaps it should go somewhere (the docs, on a branch for now) where we can both edit and comment more easily. I'd be really grateful for your input on this one. Let me know what you think!

Python conversion workflow

Aim and Motivation

The aim of this Python conversion work is to create a Python package that interfaces with isolated Fortran subroutines. This enables a host of benefits offered by Python over Fortran (easier debugging, data analysis tools, testing, file I/O, interfacing with other codes etc.), and is vital for the maintainability and extensibility of the code. It will also offer gains in modeller productivity and greater confidence in results.

In order to achieve this, the existing data structure (Fortran module variables) will need to be moved into Python (object attributes). Due to the highly interdependent nature of Process (many subroutines being reliant on variables ouside their arguments list, such as internal or external module variables), care needs to be taken to ensure success and avoid dependency "unravelling".

1. Create a Python class for the Fortran module

This new class interfaces with the corresponding Fortran module, with methods calling into the Fortran to run the module's subroutines. It is instantiated in the Models class in main.py, and run from the Caller class in caller.py.

(The instance of Models (containing the physics and engineering model instances) needs to accessed not just from the Caller class, depending on how Process is run. They also need to be initialised before the iteration variables are loaded (loadxc()), hence why they are initialised early but don't immediately appear to be run until later.)

In the module to be converted, there is ofter a main "caller" subroutine, which is responsible for calling the other subroutines in the module. If it exists, convert the main "caller" subroutine into Python. This subroutine serves just to call other subroutines, and we want each subroutine to be called directly from the Python. This allows for no use dependencies, better exception handling, logging, testing etc.

Result

At the end of this stage, all calling of the Fortran module's subroutines should be through Python class methods. This commonly consists of an __init__ method to initialise the Fortran module variables and a single "caller" method that calls the various Fortran subroutines.

2. Remove use statements from the Fortran module's subroutines

Convert the use statements in individual subroutines to subroutine arguments, explicitly setting the intent for each. This makes the subroutine isolated from other modules; its result only depends on its arguments from Python, not use dependencies. If the subroutine uses a variable from its own module, this should also be converted to an argument for complete isolation.

The Python class method that handles the subroutine call should get the required input arguments from Fortran modules and pass them to the subroutine. The output variables are then returned from the subroutine to the Python, which then sets the required Fortran module variables.

It is important that the results are still stored in the Fortran modules at this point, so that other Fortran modules that still use the variables can access them and that they are updated as subroutines are run. This prevents an "unravelling dependency" effect (chains of uses all needing converting simultaneously) and allows the conversion to be done iteratively, module by module.

Nested subroutine calls

If a subroutine calls another subroutine, then its use statement and call can be left in (subroutine arguments can't be passed in from Python anyway). If required, the subroutine can usually be refactored so that each subroutine is called directly from Python, but this isn't usually necessary.

The arguments to the second-level subroutine (both intent(in) and intent(out)) will need to be supplied by the top-level subroutine, and hence will also need to be passed from (or assigned in) the Python. This is required in order to remove module variable dependencies in the Fortran module's subroutines by converting them to arguments, which will eventually permit the switch to a Python data structure. Unfortunately this can result in some very long argument lists in the top-level subroutine, but this is unavoidable at this point. Further refactoring to split up such subroutines or call them directly from Python can reduce this in time.

If the subroutine calls are simply writing output, these can be moved into Python at this stage if it's convenient, as this will help with getting all file IO into Python in future.

Deciding on the depth of interface

The depth of the interface (the extent to which Fortran subroutines are called directly from Python) should be decided based on the usefulness of having Python at a given level. For instance, just having a Python call to the top-level "caller" subroutine in a module is not particularly useful for testing, debugging, exception handling etc., but provides a starting point. On the other hand, calling every Fortran subroutine directly from Python will require substantial Python-Fortran refactoring and will offer little benefit.

The compromise is to convert the "caller" subroutine into a Python method which calls individual Python methods to call the top-level subroutines. Some modules may require a greater or lesser depth of interface, but this is probably the most frequent case and requires a minimal amount of refactoring.

Result

At this point, Python class methods get Fortran module variables, pass them to isolated subroutines in the Fortran module and then set other Fortran module variables with the result. All Fortran module variable getting and setting for the subroutine in question is done via the Python class for the Fortran module. All "state" is still stored in the Fortran module variables, but the Python is responsible for getting and setting it. All use statements in the Fortran are for subroutine names only.

3. Remove all use variable statements from Fortran

Repeat steps 1 and 2 for each module until there are no use statements left in the Fortran for variables (useing subroutines is acceptable). Each module can be converted independently and can be merged once steps 1 and 2 are complete for it, allowing the work to be iterative.

4. Remove all <module_name>_variables.f90 modules

Once all of the use statements are removed for variables, then Python is getting and setting all Fortran module variables. The Fortran module variables can therefore be moved to Python object attributes, and the <module_name>_variables.f90 modules can be removed.

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 3 commits

Compare with previous version

jonmaddock commented 3 years ago

added 1 commit

Compare with previous version

jonmaddock commented 3 years ago

added 43 commits

Compare with previous version

jonmaddock commented 3 years ago

added 2 commits

Compare with previous version

jonmaddock commented 3 years ago

@timothy-nunn I believe steps 1 and 2 are now complete for the costs_step.f90 module, and this is ready for review. I'd appreciate your thoughts on the conversion "workflow" and how I've implemented it here; is there anything I haven't thought about?

I'm aware that in some cases it would be easier to convert certain subroutines completely into Python, but I'm keen to avoid calculations being split across the two languages, particularly in the first module to be converted. Hence I've tried to do this one by following the workflow quite rigidly.

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 08:44

Commented on process/costs_step.py line 89

I'm guessing this is how it was in the original Fortran, but why do we calculate the initial cv.cdirt then add again to it immediately in another step couldn't we move the addition of cs.step27 into this line? (Of course moving step_a27 under a25.)

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:06

Commented on process/costs_step.py line 430

Some of these variables are never used.

We could replace such variables with _ although i do not know if that would negatively impact readability so is better to leave it the "non-pythonic" way

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:07

Commented on process/costs_step.py line 477

This is not really consistent with other IO above. I think it might be more consistent and readable to put all this output under an if statement

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:11

Commented on process/costs_step.py line 995

Again another unused variable that might be more pythonic as _ if we choose to go that route.

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:17

Commented on process/evaluators.py line 94

Is this a relic of Fortran, can we remove this now and just use ifail_in? or at least ifail_out = ifail_in

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:17

Commented on process/evaluators.py line 190

Same as above

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:18

Commented on process/final.py line 4

Consistency: ... import output as ot or whatever it is imported as in other files

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:19

Commented on process/main.py line 51

Consistency: again, we probably want to import as ... to keep consistent with other files

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:24

Commented on source/fortran/costs_step.f90 line 30

This strikes me as something that could be Pythonised since its only called from costs_step.py and test_costs_step.py

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:26

Commented on source/fortran/costs_step.f90 line 53

Again, I would be inclined to say that this routine only divides one number. I think this could be appropriately converted to pure Python. I guess we need to decide how strict everyone wants to be and what constitutes "maths".

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:28

Commented on source/fortran/costs_step.f90 line 1059

Again, is this really justifiable? More so than the former certainly, but I still do not know if its enough.

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:29

Commented on source/fortran/costs_step.f90 line 1659

Strategy decided above will influence this routine too.

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 09:30

Commented on source/fortran/costs_step.f90 line 1080

Strategy above may affect this one too

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 14:53

I'm going to make some notes on this here for us to discuss at a later date:

I agree with your points. I think we should maybe iron out the depth of the interface, as in 2..

We want to keep the maths inside of Fortran, however we want IO and exception handling to be managed inside of Python. This should become our explicit goal: we want to Fortran to be exclusively maths (and other legacy code we darent touch) and to convert the entire IO and error system into Python (as you said). The issue is then, however, that we end up with long argument lists for our function/subroutine calls into the Fortran.

We might want to consider some sort of wrapper system to abstract this horribleness away from the frontend Python (this is just thinking out loud, I do not know how easy or useful this will be in practice),

jonmaddock commented 3 years ago

In GitLab by @timothy-nunn on Oct 27, 2021, 14:54

I think we would also be wise to get other PROCESS users input on this... do they understand the changes made to costs_step? Do they understand what goes into Fortran and what into Python?

jonmaddock commented 3 years ago

I agree with all in your first comment, and I think we've already gone some way to discussing this regarding converting modules completely to Python vs. creating Python-Fortran interfaces, deciding on the approach for each module in turn.

I would be wary of creating another wrapper however; I think a long subroutine argument list (contained in its own Python method) is preferable and much simpler. I would argue that being contained in a method is not dissimilar to being wrapped (simple method call with no/few arguments containing a subroutine call with many arguments). It also serves as a good reminder that the subroutine in question needs splitting up too! Kristian spent some time investigating wrapper options, but they generally ended up being pretty complex and in my opinion adding to the overall complexity.

You're absolutely right about keeping the modellers on board; I would be inclined to get this merged in and once we have a fully-Pythonised module and a Python-Fortran hybrid module, we can present them to the group, explain it and answer any questions. I think that dealing with concrete examples that are already working, rather than hypotheticals, is the way to go.

jonmaddock commented 3 years ago

Because cv.cdirt at that point is used as an input to cs.step_a27() in the step_a27() method. This could be refactored, but I was trying to just stick to the minimum required for the conversion as much as possible.