precice / tutorials

Various tutorial cases for the coupling library preCICE with real solvers. These files are meant to be rendered on precice.org, so don't look at the README files here.
https://www.precice.org/
GNU Lesser General Public License v3.0
107 stars 111 forks source link

Results regression in system tests #594

Open MakisH opened 1 week ago

MakisH commented 1 week ago

As discussed in the coding days (Nov 2024), we have a strange regression in some results compared to the reference results. See original discussion in https://github.com/precice/precice/issues/2131.

I have excluded changes in the preCICE repository, as I get the same issue with preCICE v3.1.2 (results were actually produced with v3.1.1, but this should not matter).

Based on where we have regressions, I suspect something related either to the Python bindings or to the Python dependencies.

Overview

Results regression :x:

Works :heavy_check_mark:

MakisH commented 1 week ago

Without changing anything from our side (I think), the Nutils and SU2-FEniCS cases have been fixed. The elastic-tube-1d python-python remains broken:

https://github.com/precice/precice/actions/runs/11869063712/job/33078975390?pr=2052

Maybe a Python dependency?

uekerman commented 1 week ago

Could we compare which NumPy version was used for the reference data and which one now? Is this in our data?

MakisH commented 4 days ago

Could we compare which NumPy version was used for the reference data and which one now? Is this in our data?

No, unfortunately this is not yet in our data, but we could implement a pip freeze in the output (#596).

MakisH commented 4 days ago

I am now investigating the results themselves. fieldcompare writes also diff files, but these are the absolute differences. We currently use a relative tolerance of 3e-7 (see #393).

Comparing runs on the tests VM and on my laptop

In both systems, I have ran a docker system prune -a first to ensure that no caching-related issues occur. Trying to reproduce this job, I ran on my laptop:

python3 systemtests.py --build_args=PLATFORM:ubuntu_2204,PRECICE_REF:f6e48e45167d9312ac14cec9efa8222a915ef201,PYTHON_BINDINGS_REF:b6b9ee5,CALCULIX_VERSION:2.20,CALCULIX_ADAPTER_REF:8eb1d43,FENICS_ADAPTER_REF:3de561d,OPENFOAM_EXECUTABLE:openfoam2312,OPENFOAM_ADAPTER_REF:20b4617,SU2_VERSION:7.5.1,SU2_ADAPTER_REF:64d4aff,TUTORIALS_REF:c5bc59f --suites=release_test

Already this probably hints that our thresholds are maybe too tight (even though I cannot definitely explain why).

Elastic tube 1d

Opening the file groups related to the Fluid-Nodes-Mesh-Fluid (from the case that fails on the VM) in ParaView, and applying a TemporalStatistics filter, I get:

At least for the CrossSectionLength, one could argue that the diff is small, but the values themselves are also small, making all this more prone to floating-point errors.

tutorials-elastic-tube-1d-all

Trying to reproduce the reference results

Running the following test locally (again after a docker system prune -a):

python3 systemtests.py --suites release_test --build_args PLATFORM:ubuntu_2204,PRECICE_REF:v3.1.1,PYTHON_BINDINGS_REF:b6b9ee5,CALCULIX_VERSION:2.20,CALCULIX_ADAPTER_REF:v2.20.1,FENICS_ADAPTER_REF:v2.1.0,OPENFOAM_EXECUTABLE:openfoam2312,OPENFOAM_ADAPTER_REF:v1.3.0,SU2_VERSION:7.5.1,SU2_ADAPTER_REF:64d4aff,TUTORIALS_REF:c5bc59f --log-level DEBUG

(using the versions from reference_results.metadata, except using the latest python-bindings and tutorials ref because we did not have the requirements.txt before)

The same tests are failing with regressions, so it is either something in the python-bindings/tutorials (which I think I have previously excluded), or something outside our control.

How to move on

Do we now accept this as the new baseline, or do we keep digging?

Do we already relax the tolerances?

Should we maybe introduce a difference tolerance per test case?