xgcm / aerobulk-python

A python wrapper for aerobulk (https://github.com/brodeau/aerobulk)
GNU General Public License v3.0
14 stars 4 forks source link

Better error handling #4

Closed jbusecke closed 2 years ago

jbusecke commented 2 years ago

During development on #3 I noticed that errors in the fortran code seem to lead in an immediate crash of the kernel. We should investigate how to catch and display these (I was not able to see the relevant message in a notebook, but did in an ipython session).

To reproduce I entered this (wildly wrong units) into ipython

import mod_aerobulk_wrap as aero
import numpy as np

ni=3
nj=6
calgo='ncar'
zt=2
zu=5
sst = np.full([ni, nj], 30.0)
t_zt = np.full([ni, nj], 20.0)
hum_zt = np.full([ni, nj], 20.0) # watch units here...
u_zu=np.full([ni, nj], 2.0)
v_zu=np.full([ni, nj], 0.0)
slp =np.full([ni, nj], 1000.0)
niter=1

ql,qh,tau_x,tau_y,evap = aero.mod_aerobulk_wrapper.aerobulk_model_noskin(calgo,zt,zu,sst,t_zt,hum_zt,u_zu,v_zu,slp,niter)

Which gives:

===================================================================
                    ----- AeroBulk_init -----

     *** Bulk parameterization to be used => "ncar"
     *** Cool-skin & Warm-layer schemes will NOT be used!
     *** Computational domain shape: Ni x Nj = 00003 x 00006
     *** Number of time records that will be treated:           1
     *** Number of iterations in bulk algos: nb_iter  =    1
     *** Filling the `mask` array...
 *** E R R O R :
 the whole domain is masked!
 check unit consistency of input fields

Might be relevant to this : https://numpy.org/doc/stable/f2py/signature-file.html#f2py-statements

jbusecke commented 2 years ago

Also a strong argument to have a pint unit check in this 😹 @TomNicholas

rabernat commented 2 years ago

I just read this from https://numpy.org/doc/stable/f2py/signature-file.html#f2py-statements

callstatement <C-expr|multi-line block> Replaces the F2PY generated call statement to Fortran/C function with <C-expr|multi-line block>. The wrapped Fortran/C function is available as (*f2py_func).

To raise an exception, set f2py_success = 0 in <C-expr|multi-line block>.

It's pretty cryptic, but basically I think we need to add this to the signature file.

The way aerobulk handles errors is to call the ctl_stop function, defined here: https://github.com/brodeau/aerobulk/blob/03b40b2d48cc5ad4dc41b256b00c54a4d40d124e/src/mod_const.f90#L238-L278

This code calls the fortran STOP command. So the general problem we need to solve is how to pass a fortran STOP command up to python as an exception.

This SO issue is relevant: https://stackoverflow.com/questions/18021081/stop-python-code-in-fortran-module-error-using-f2py

I would start by writing a test for the expected exception.

rabernat commented 2 years ago

This page is also very helpful https://github.com/pearu/f2py/wiki/FAQ2ed. They suggest we will need to modify the fortran source code 😒

jbusecke commented 2 years ago

This page is also very helpful https://github.com/pearu/f2py/wiki/FAQ2ed. They suggest we will need to modify the fortran source code

But what is solution 2, haha.

jbusecke commented 2 years ago

I am going to try my best to summarize the short hack session with @rabernat and @TomNicholas today:

The ultimate goal was to somehow enable us to call a python function from fortran (callback function) which then raises an error without a crashing python kernel.

We mostly followed these instructions.

We added this codeblock to mod_aerobulk_wrap.f90 (see also #11):

subroutine foo2(a)
    integer a
    if (a.gt.10) then
      print*, "Fortran: a>10 is bad, stopping here."
      call f2pystop(10) ! f2pystop must raise an exception that will trigger a long jump to the corresponding wrapper function
      stop (10)
    end if
    print*,"Fortran foo: a<=10 is good, continue."
  end subroutine foo2

and the following code to mod_aerobulk_wrap.pyf:

            subroutine foo2(a) ! in :m2:foo.f90
                integer :: a
                intent (callback) f2pystop
                external f2pystop
                integer status
                call f2pystop(status)
            end subroutine foo2

recompiled the code with pip install -e . and then ran:

from aerobulk.aerobulk.mod_aerobulk_wrap  import mod_aerobulk_wrapper as w

class F2PYSTOP(Exception):
    def __call__(self, status):
        raise self.__class__(status)

w.foo2(11, F2PYSTOP())

Contrary to the example this resulted in a rather gnarly error:

 Fortran: a>10 is bad, stopping here.
capi_return is NULL
Call-back cb_f2pystop_in_foo2__user__routines failed.
Fatal Python error: F2PySwapThreadLocalCallbackPtr: F2PySwapThreadLocalCallbackPtr: PyLong_AsVoidPtr failed
Python runtime state: initialized
Traceback (most recent call last):
  File "<ipython-input-1-8ef22c6715ac>", line 3, in __call__
__main__.F2PYSTOP: 10

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, aerobulk.aerobulk.mod_aerobulk_wrap (total: 14)
[1]    15315 abort      ipython

We experimented a bit further and changing

stop (10)

to

return (10)

in the first code block enabled us to run this without crashing the python kernel. @rabernat was hopeful that we might be able to do something with the return values, but we need to continue working on this.

jbusecke commented 2 years ago

I found this, which I believe is the prequel to our source used above.

1) I don't think that you need to deal with longjmp in Fortran. You can replace Fortran STOP statement with a call to C function that will deal with the longjmp stuff.

It seems like some more brute force approaches (e.g. replacing STOP statements in all fortran files) as part of the build could be considered if we find a way to use return?

jbusecke commented 2 years ago

Notes from our hack on 2022/5/4:

python module RANGE_CHECK ! in interface ! in :RANGE_CHECK module range_mod ! in :RANGE_CHECK:range_check.f90 subroutine range_check(a) ! in :RANGE_CHECK:range_check.f90:range_mod integer :: a !intent (callback) f2pystop !external f2pystop !call f2pystop(a) !threadsafe callstatement {{ & Py_BEGIN_ALLOW_THREADS & (*f2py_func)(&a); & Py_END_ALLOW_THREADS & PyErr_SetString(PyExc_ValueError, "Oooops"); & }} end subroutine range_check end module range_mod end interface end python module RANGE_CHECK

! This file was auto-generated with f2py (version:1.22.3). ! See: ! https://web.archive.org/web/20140822061353/http://cens.ioc.ee/projects/f2py2e

the corresponding fortran file is : 

MODULE RANGE_MOD IMPLICIT NONE PUBLIC :: RANGE_CHECK

CONTAINS subroutine RANGE_CHECK(a) INTEGER :: a if (a.gt.10) then print, "Fortran: a>10 is bad, stopping here NEW2." !call f2pystop(a) ! f2pystop must raise an exception that will trigger a long jump to the corresponding wrapper function error_raised = 1 RETURN end if print,"Fortran foo: a<=10 is good, continue." end subroutine RANGE_CHECK

subroutine ctl_stop( cd1 ) CHARACTER(len=*), INTENT(in) :: cd1

END MODULE RANGE_MOD


This will raise an error like such:

pytest -s test.py ==================================================================== test session starts ===================================================================== platform darwin -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 rootdir: /Users/juliusbusecke/Code/fortran_error_test collected 2 items

test.py Fortran: a>10 is bad, stopping here NEW2. F Fortran foo: a<=10 is good, continue. F

========================================================================== FAILURES ========================================================================== __ test_range_error __

def test_range_error():
    # range_mod(12, raise_error)
  range_mod(12)

E ValueError: Oooops

test.py:15: ValueError __ test_range_good ___

def test_range_good():
    # range_mod(12, raise_error)
  range_mod(2)

E ValueError: Oooops

test.py:21: ValueError ================================================================== short test summary info =================================================================== FAILED test.py::test_range_error - ValueError: Oooops FAILED test.py::test_range_good - ValueError: Oooops ===================================================================== 2 failed in 0.36s ====================================================================== (fortran_error_test)


for a test script: 
```python
import pytest
from RANGE_CHECK import range_mod

errormsg = 0

def raise_error(msg):
    global errormsg
    print(f"raising error message {msg}")
    errormsg = msg

def test_range_error():
    # range_mod(12, raise_error)
    range_mod(12)
    assert errormsg == 12

def test_range_good():
    # range_mod(12, raise_error)
    range_mod(2)
    assert 1 == 1

What we need to do next:

Steps to reproduce results with above files:

  1. Build initial signature file (only do once, because we modify this): python -m numpy.f2py range_check.f90 -m RANGE_CHECK -h range_check.pyf --overwrite-signature
  2. Build extensions using f2py: python -m numpy.f2py --verbose -c --f90flags="-fdefault-real-8 -ffree-line-length-200 --std=gnu" range_check.f90 range_check.pyf
  3. Optional: Build c wrappers for inspection: python -m numpy.f2py --verbose -m range_check.f90 range_check.pyf
rabernat commented 2 years ago

Epic session today.

jbusecke commented 2 years ago

😎🤓

rabernat commented 2 years ago

We just discovered that @brian-rose is an expert on f2py! (See https://github.com/brian-rose/minimalf2py).

Brian, do you have any experience on error handling from Fortran. The above thread is very long and probably confusing, but basically we are trying to redirect Fortran STOP commands to raise Python exceptions (rather than crashing the entire session). Have you ever dealt with that situation before?

brian-rose commented 2 years ago

We just discovered that @brian-rose is an expert on f2py! (See https://github.com/brian-rose/minimalf2py).

... yikes. I am now about to be unmasked as a non-expert.

Unfortunately no, I've never dealt with this. To my knowledge, none of the fortran code I've ever tried to wrapped used STOP commands. I'll follow your progress here though!

If there's a good general solution (perhaps involving modifying the signature file, as you discussed above), then that might make a nice addition to minimalf2py

jbusecke commented 2 years ago

So I just realized that omitting all the error checking in AEROBULK_INIT will lead to a segfault (see https://github.com/xgcm/aerobulk-python/pull/42/commits/2536b44a142fb94ebdec28677cc2de5a512b2306). This is not helpful for the user, but it at least shows up as failed in the CI!

I think we should just bite the bullet and to the range_check in the numpy code (where we can raise useful errors). I have touched upon this in a bit more detail in https://github.com/xgcm/aerobulk-python/issues/36#issue-1282676834

jbusecke commented 2 years ago

I now mark this as abandoned and close it due to the changes introduced in #43