jdber1 / opendrop

OpenDrop pendant drop tensiometry software
GNU General Public License v3.0
30 stars 18 forks source link

Interfacial Tension fails with sundials 6.4.1: ARKODE::ERKStep ERROR #46

Closed drew-parsons closed 1 year ago

drew-parsons commented 1 year ago

opendrop 3.3.1 fails to build with sundials 6. The build is "fixed" by commit cf9d5aa in development branch. I'm building from the debian source for opendrop 3.3.1, after adding the cf9d5aa patch. I succeed building against sundials 6.4.1 (which provides arkode 5.4.1).

But after applying that patch, sundials (arkode) still fails at runtime. It is used when analysing a droplet for Interfacial Tension, making a Young-Laplace model of the shape (opendrop/fit/younglaplace/shape.pyx, using include/opendrop/younglaplace_detail.hpp)

After setting the Drop Region and Needle Region in the image of the drop and pressing the "Analyse" button, it gives this error:

[ARKODE::ERKStep ERROR]  erkStep_FullRHS
  At t = inf, the right-hand side routine failed in an unrecoverable manner.

[ARKODE ERROR]  ARKODE
  At t = 0, the right-hand side routine failed in an unrecoverable manner.

Exception in callback PendantAnalysisJob._ylfit_done(<Future finis...e() failed.')>)
handle: <GLibSourceHandle PendantAnalysisJob._ylfit_done(<Future finis...ylfit_done()]>)>
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/opendrop/fit/younglaplace/__init__.py", line 51, in young_laplace_fit
    model.set_params(initial_params)
  File "/usr/lib/python3/dist-packages/opendrop/fit/younglaplace/model.py", line 86, in set_params
    dr_dBo, dz_dBo = radius * shape.DBo(s)
                              ^^^^^^^^^^^^
  File "opendrop/fit/younglaplace/shape.pyx", line 71, in opendrop.fit.younglaplace.shape.YoungLaplaceShape.DBo
  File "opendrop/fit/younglaplace/shape.pyx", line 88, in opendrop.fit.younglaplace.shape.YoungLaplaceShape.DBo_array
RuntimeError: ERKStepEvolve() failed.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/opendrop/vendor/aioglib/_loop.py", line 463, in __call__
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3/dist-packages/opendrop/app/ift/services/analysis.py", line 219, in _ylfit_done
    raise e
  File "/usr/lib/python3/dist-packages/opendrop/app/ift/services/analysis.py", line 217, in _ylfit_done
    result = fut.result()
             ^^^^^^^^^^^^
RuntimeError: ERKStepEvolve() failed.
eugenhu commented 1 year ago

Strange, can you share the drop image and I'll try running the analysis on my computer.

Also, can you try cloning the repo (development branch) and installing cython and SCons, and then running scons in the repo's root directory to build the C tests for the Young-Laplace solver. Then try running test_interpolate and test_younglaplace in the 'tests/c/' directory to make sure they work.

Thanks.

drew-parsons commented 1 year ago

The error seems to always occur, any image. This one reproduces the problem.: https://github.com/jdber1/opendrop/blob/bf5ad09e792063806a9894a48e41c21c928f3a52/tests/samples/images/image0.png

I'll try the C tests. Python tests passed already.

drew-parsons commented 1 year ago

C tests are passing, when run via scons (the full package build ran via a python wheels build)

$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o opendrop/features/.checkpoints/colorize.os -c -O3 -std=c++14 -fPIC -Iinclude -I/usr/include/python3.11 opendrop/features/.checkpoints/colorize.cpp
g++ -o opendrop/fit/needle/.checkpoints/hough.os -c -O3 -std=c++14 -fPIC -Iinclude -I/usr/include/python3.11 opendrop/fit/needle/.checkpoints/hough.cpp
g++ -o opendrop/fit/needle/hough.cpython-311-x86_64-linux-gnu.so -shared opendrop/fit/needle/.checkpoints/hough.os -L/usr/lib/x86_64-linux-gnu -lpython3.11 -lm
g++ -o opendrop/fit/younglaplace/.checkpoints/shape.os -c -O3 -std=c++14 -fPIC -Iinclude -I/usr/include/python3.11 opendrop/fit/younglaplace/.checkpoints/shape.cpp
g++ -o opendrop/fit/younglaplace/shape.cpython-311-x86_64-linux-gnu.so -shared opendrop/fit/younglaplace/.checkpoints/shape.os -L/usr/lib/x86_64-linux-gnu -lpython3.11 -lsundials_arkode -lsundials_nvecserial
Packaging build/opendrop-0.0.0-cp311-abi3-linux-x86_64.whl
g++ -o tests/c/test_interpolate.o -c -O3 -std=c++14 -DBOOST_TEST_DYN_LINK -Iinclude tests/c/test_interpolate.cpp
g++ -o tests/c/test_interpolate tests/c/test_interpolate.o -lboost_unit_test_framework
g++ -o tests/c/test_younglaplace.o -c -O3 -std=c++14 -DBOOST_TEST_DYN_LINK -Iinclude tests/c/test_younglaplace.cpp
g++ -o tests/c/test_younglaplace tests/c/test_younglaplace.o -lboost_unit_test_framework -lsundials_arkode -lsundials_nvecserial
scons: done building targets.
$ cd tests/c
$ ./test_interpolate 
Running 11 test cases...

*** No errors detected
$ ./test_younglaplace 
Running 8 test cases...

*** No errors detected
eugenhu commented 1 year ago

Thanks for testing. I'll need to do some experimenting myself later to reproduce this.

drew-parsons commented 1 year ago

For comparison, the Debian build logs are found at https://buildd.debian.org/status/package.php?p=opendrop e.g. https://buildd.debian.org/status/fetch.php?pkg=opendrop&arch=amd64&ver=3.3.1-4%2Bb1&stamp=1675071543&raw=0 They add some other build flags (e.g. -D_FORTIFY_SOURCE=2 -fstack-protector-strong), but tests/c still passes if I added these to a manual C test build.

eugenhu commented 1 year ago

I'm thinking this might be the culprit:

https://github.com/jdber1/opendrop/blob/e18fbf3d7d6f94d02f4846400be518a56b86e73c/include/opendrop/younglaplace_detail.hpp#L637

https://github.com/jdber1/opendrop/blob/e18fbf3d7d6f94d02f4846400be518a56b86e73c/include/opendrop/younglaplace_detail.hpp#L650-L651

A bit of a hack to regularize singularities at the initial value. Could you try changing this to:

INFINITESIMAL = std::numeric_limits<realtype>::min(); 

or INFINITESIMAL = 1e-10. I think I also overlooked testing "DBo()" in the c tests, which is where its crashing.

drew-parsons commented 1 year ago

It still needs the type, right? [edit: yes, this part of the code is C++]

static const realtype INFINITESIMAL = std::numeric_limits<realtype>::min();

That doesn't seem to fix it though. Still gets the erkStep_FullRHS error

Likewise the error still occurs with

static const realtype INFINITESIMAL = 1e-10;
eugenhu commented 1 year ago

Ok, thanks for testing. Turns out the problem was a silly mistake introduced many commits ago, you can check https://github.com/jdber1/opendrop/commit/2f7dc0eda952eb4a1024fba480f1c9f547cc5829 for bug detail.

Could you try reinstalling the development branch to see if it's resolved on your end too?

drew-parsons commented 1 year ago

2f7dc0e fixes it. Processes test images fine now. Calculated interfacial tensions look reasonable.

I applied the #2f7dc0e patch against the 3.3.1 source. I didn't test the full development branch for logistical reasons, since debian is in freeze for the next stable release, only minimal patches are permitted (debian has a mechanism for easily applying individual patches).

For me #2f7dc0e resolves the issue. Can consider this bug closed.