Closed jonmaddock closed 1 year ago
In GitLab by @pb9430 on Aug 25, 2021, 12:11
added 5 commits
develop
@pb9430 this looks great so far! I'm working to let you access the costs_step
object in caller.py
, so replace the existing call directly into Fortran.
added 1 commit
@timothy-nunn This latest commit (192ec375) isn't the best, but it allows the caller to begin calling methods on Python class instances, such as costs_step
, rather than straight into the Fortran subroutines. This should mean that the CostsStep
class can now be used to begin storing the Fortran variables and "driving" the isolated Fortran subroutines in costs_step.f90
. The f2pybike repo might help with this!
I am happy to tidy up/completely change the way I've done this so far.
added 164 commits
develop
added 1 commit
added 3 commits
added 1 commit
added 2 commits
added 1 commit
added 2 commits
added 2 commits
@timothy-nunn this is my WIP (but best so far!) conversion workflow, which I've updated in light of actually starting the conversion on costs_step.f90
. It's based on the workflow in my toy f2pybike
repo. I've dumped it here for now, but perhaps it should go somewhere (the docs, on a branch for now) where we can both edit and comment more easily. I'd be really grateful for your input on this one. Let me know what you think!
The aim of this Python conversion work is to create a Python package that interfaces with isolated Fortran subroutines. This enables a host of benefits offered by Python over Fortran (easier debugging, data analysis tools, testing, file I/O, interfacing with other codes etc.), and is vital for the maintainability and extensibility of the code. It will also offer gains in modeller productivity and greater confidence in results.
In order to achieve this, the existing data structure (Fortran module variables) will need to be moved into Python (object attributes). Due to the highly interdependent nature of Process (many subroutines being reliant on variables ouside their arguments list, such as internal or external module variables), care needs to be taken to ensure success and avoid dependency "unravelling".
This new class interfaces with the corresponding Fortran module, with methods calling into the Fortran to run the module's subroutines. It is instantiated in the Models
class in main.py
, and run from the Caller
class in caller.py
.
(The instance of Models
(containing the physics and engineering model instances) needs to accessed not just from the Caller
class, depending on how Process is run. They also need to be initialised before the iteration variables are loaded (loadxc()
), hence why they are initialised early but don't immediately appear to be run until later.)
In the module to be converted, there is ofter a main "caller" subroutine, which is responsible for calling the other subroutines in the module. If it exists, convert the main "caller" subroutine into Python. This subroutine serves just to call other subroutines, and we want each subroutine to be called directly from the Python. This allows for no use
dependencies, better exception handling, logging, testing etc.
At the end of this stage, all calling of the Fortran module's subroutines should be through Python class methods. This commonly consists of an __init__
method to initialise the Fortran module variables and a single "caller" method that calls the various Fortran subroutines.
use
statements from the Fortran module's subroutinesConvert the use
statements in individual subroutines to subroutine arguments, explicitly setting the intent
for each. This makes the subroutine isolated from other modules; its result only depends on its arguments from Python, not use
dependencies. If the subroutine uses a variable from its own module, this should also be converted to an argument for complete isolation.
The Python class method that handles the subroutine call should get the required input arguments from Fortran modules and pass them to the subroutine. The output variables are then returned from the subroutine to the Python, which then sets the required Fortran module variables.
It is important that the results are still stored in the Fortran modules at this point, so that other Fortran modules that still use
the variables can access them and that they are updated as subroutines are run. This prevents an "unravelling dependency" effect (chains of use
s all needing converting simultaneously) and allows the conversion to be done iteratively, module by module.
If a subroutine calls another subroutine, then its use
statement and call can be left in (subroutine arguments can't be passed in from Python anyway). If required, the subroutine can usually be refactored so that each subroutine is called directly from Python, but this isn't usually necessary.
The arguments to the second-level subroutine (both intent(in)
and intent(out)
) will need to be supplied by the top-level subroutine, and hence will also need to be passed from (or assigned in) the Python. This is required in order to remove module variable dependencies in the Fortran module's subroutines by converting them to arguments, which will eventually permit the switch to a Python data structure. Unfortunately this can result in some very long argument lists in the top-level subroutine, but this is unavoidable at this point. Further refactoring to split up such subroutines or call them directly from Python can reduce this in time.
If the subroutine calls are simply writing output, these can be moved into Python at this stage if it's convenient, as this will help with getting all file IO into Python in future.
The depth of the interface (the extent to which Fortran subroutines are called directly from Python) should be decided based on the usefulness of having Python at a given level. For instance, just having a Python call to the top-level "caller" subroutine in a module is not particularly useful for testing, debugging, exception handling etc., but provides a starting point. On the other hand, calling every Fortran subroutine directly from Python will require substantial Python-Fortran refactoring and will offer little benefit.
The compromise is to convert the "caller" subroutine into a Python method which calls individual Python methods to call the top-level subroutines. Some modules may require a greater or lesser depth of interface, but this is probably the most frequent case and requires a minimal amount of refactoring.
At this point, Python class methods get Fortran module variables, pass them to isolated subroutines in the Fortran module and then set other Fortran module variables with the result. All Fortran module variable getting and setting for the subroutine in question is done via the Python class for the Fortran module. All "state" is still stored in the Fortran module variables, but the Python is responsible for getting and setting it. All use
statements in the Fortran are for subroutine names only.
use
variable statements from FortranRepeat steps 1 and 2 for each module until there are no use
statements left in the Fortran for variables (use
ing subroutines is acceptable). Each module can be converted independently and can be merged once steps 1 and 2 are complete for it, allowing the work to be iterative.
<module_name>_variables.f90
modulesOnce all of the use
statements are removed for variables, then Python is getting and setting all Fortran module variables. The Fortran module variables can therefore be moved to Python object attributes, and the <module_name>_variables.f90
modules can be removed.
added 1 commit
added 2 commits
added 2 commits
added 3 commits
added 43 commits
develop
added 2 commits
@timothy-nunn I believe steps 1 and 2 are now complete for the costs_step.f90
module, and this is ready for review. I'd appreciate your thoughts on the conversion "workflow" and how I've implemented it here; is there anything I haven't thought about?
I'm aware that in some cases it would be easier to convert certain subroutines completely into Python, but I'm keen to avoid calculations being split across the two languages, particularly in the first module to be converted. Hence I've tried to do this one by following the workflow quite rigidly.
In GitLab by @timothy-nunn on Oct 27, 2021, 08:44
Commented on process/costs_step.py line 89
I'm guessing this is how it was in the original Fortran, but why do we calculate the initial cv.cdirt then add again to it immediately in another step couldn't we move the addition of cs.step27 into this line? (Of course moving step_a27 under a25.)
In GitLab by @timothy-nunn on Oct 27, 2021, 09:06
Commented on process/costs_step.py line 430
Some of these variables are never used.
We could replace such variables with _
although i do not know if that would negatively impact readability so is better to leave it the "non-pythonic" way
In GitLab by @timothy-nunn on Oct 27, 2021, 09:07
Commented on process/costs_step.py line 477
This is not really consistent with other IO above. I think it might be more consistent and readable to put all this output under an if statement
In GitLab by @timothy-nunn on Oct 27, 2021, 09:11
Commented on process/costs_step.py line 995
Again another unused variable that might be more pythonic as _
if we choose to go that route.
In GitLab by @timothy-nunn on Oct 27, 2021, 09:17
Commented on process/evaluators.py line 94
Is this a relic of Fortran, can we remove this now and just use ifail_in? or at least ifail_out = ifail_in
In GitLab by @timothy-nunn on Oct 27, 2021, 09:17
Commented on process/evaluators.py line 190
Same as above
In GitLab by @timothy-nunn on Oct 27, 2021, 09:18
Commented on process/final.py line 4
Consistency: ... import output as ot
or whatever it is imported as in other files
In GitLab by @timothy-nunn on Oct 27, 2021, 09:19
Commented on process/main.py line 51
Consistency: again, we probably want to import as ...
to keep consistent with other files
In GitLab by @timothy-nunn on Oct 27, 2021, 09:24
Commented on source/fortran/costs_step.f90 line 30
This strikes me as something that could be Pythonised since its only called from costs_step.py and test_costs_step.py
In GitLab by @timothy-nunn on Oct 27, 2021, 09:26
Commented on source/fortran/costs_step.f90 line 53
Again, I would be inclined to say that this routine only divides one number. I think this could be appropriately converted to pure Python. I guess we need to decide how strict everyone wants to be and what constitutes "maths".
In GitLab by @timothy-nunn on Oct 27, 2021, 09:28
Commented on source/fortran/costs_step.f90 line 1059
Again, is this really justifiable? More so than the former certainly, but I still do not know if its enough.
In GitLab by @timothy-nunn on Oct 27, 2021, 09:29
Commented on source/fortran/costs_step.f90 line 1659
Strategy decided above will influence this routine too.
In GitLab by @timothy-nunn on Oct 27, 2021, 09:30
Commented on source/fortran/costs_step.f90 line 1080
Strategy above may affect this one too
In GitLab by @timothy-nunn on Oct 27, 2021, 14:53
I'm going to make some notes on this here for us to discuss at a later date:
I agree with your points. I think we should maybe iron out the depth of the interface, as in 2..
We want to keep the maths inside of Fortran, however we want IO and exception handling to be managed inside of Python. This should become our explicit goal: we want to Fortran to be exclusively maths (and other legacy code we darent touch) and to convert the entire IO and error system into Python (as you said). The issue is then, however, that we end up with long argument lists for our function/subroutine calls into the Fortran.
We might want to consider some sort of wrapper system to abstract this horribleness away from the frontend Python (this is just thinking out loud, I do not know how easy or useful this will be in practice),
In GitLab by @timothy-nunn on Oct 27, 2021, 14:54
I think we would also be wise to get other PROCESS users input on this... do they understand the changes made to costs_step? Do they understand what goes into Fortran and what into Python?
I agree with all in your first comment, and I think we've already gone some way to discussing this regarding converting modules completely to Python vs. creating Python-Fortran interfaces, deciding on the approach for each module in turn.
I would be wary of creating another wrapper however; I think a long subroutine argument list (contained in its own Python method) is preferable and much simpler. I would argue that being contained in a method is not dissimilar to being wrapped (simple method call with no/few arguments containing a subroutine call with many arguments). It also serves as a good reminder that the subroutine in question needs splitting up too! Kristian spent some time investigating wrapper options, but they generally ended up being pretty complex and in my opinion adding to the overall complexity.
You're absolutely right about keeping the modellers on board; I would be inclined to get this merged in and once we have a fully-Pythonised module and a Python-Fortran hybrid module, we can present them to the group, explain it and answer any questions. I think that dealing with concrete examples that are already working, rather than hypotheticals, is the way to go.
Because cv.cdirt
at that point is used as an input to cs.step_a27()
in the step_a27()
method. This could be refactored, but I was trying to just stick to the minimum required for the conversion as much as possible.
In GitLab by @pb9430 on Aug 23, 2021, 11:26
_Merges 1385-convert-costsstep-f90-to-python -> develop
Closes #1385