C stress test failed - Githubissues

zaikunzhang commented 5 months ago

See the branch

https://github.com/libprima/prima/tree/c_stress_test_failure

and the workflow run

https://github.com/libprima/prima/actions/runs/7934551296/job/21665774763

zaikunzhang commented 5 months ago

Hi @nbelakovski ,

Could you take a look at this? There is a bug here (I have created a branch to reproduce the bug easily: https://github.com/libprima/prima/tree/c_stress_test_failure).

In the new version (which has other issues as you mentioned; I will fix them), the bug is not triggered anymore but we do not know why. There is nothing more scary than a magically disappeared bug.

Many thanks.

Best regards, Zaikun

nbelakovski commented 3 months ago

I think I see the issue. result is not initialized, and in C variables that aren't initialized can theoretically have garbage data in them (basically the contents of whatever memory address they were given). In the branch linked, all of the algorithms except cobyla fail prima_check_problem with the return code indicating a problem mismatch (because problem.calcfc is set for all of them), and so prima_init_result is never called. Then stress.c called prima_free_result on uninitialized memory and hence tried to free memory that was not allocated.

I've been unable to reproduce this locally, even when using the same intel compiler, so the above is theoretical but I think it's a compelling case, particularly since the issue does not arise for cobyla (also note the failed windows test is due to the issue with the setup-fortran action linking the wrong libgfortran which we fixed).

Note that calcfc is no longer specified for all algorithms, it appears this was corrected in c85e631a435a2b66a2bf30805f1bfd565da62eab.

However the potential issue of the user calling prima_free_result on uninitialized memory still exists. I see two options for fixing this:

We can call prima_init_result before we call prima_check_problem. However if prima_check_problem fails then we should free the memory and set result.x/result.nlconstr to 0 before returning.
We call memset(result, 0, sizeof(prima_result_t)); at the very beginning of prima_minimize. In this option we keep the order of prima_init_result and prima_check_problem as they are (checking the problem before initializing).

Either option allows the user to call prima_free_result afterwards with no ill effects, since it guarantees that result.x and result.nlconstr will be 0 or they will be allocated. I think option 2 is better since it helps to avoid unnecessary calls to malloc and free.

zaikunzhang commented 3 months ago

set result.x/result.nlconstr to 0

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

If they are not defined yet, then they should have a velue that indicates "uninitialised" (in the mathematical sense, which is different from "uninitialised" in the programming sense). The ideal value would be NaN.

This is w very important point. Otherwise, the logic is wrong and it may lead to mysterious bugs.

However, here the situation is different. result.x should be x0, and the other should be the constraint value at x0.

BTW, I am against passing the value of x0 to minimize using result.x. The logic is wrong. Before the optimization starts, result.x should be uninitialised (in the mathematical sense).

Thank you.

zaikunzhang commented 3 months ago

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

BTW, I am against passing the value of x0 to minimize using result.x. The logic is wrong. Before the optimization starts, result.x should be uninitialised (in the mathematical sense).

Let me put is in this way: at any moment, the content of "result" should be consistent. It cannot be partially initialised and partially not (unless the initialisation is being done), and it cannot contain values that seems "normal" when it is indeed "uninitialised" (mathematically), giving us the wrong impression that it is initialised when it is indeed not.

nbelakovski commented 3 months ago

set result.x/result.nlconstr to 0

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

I wasn't clear here: result.x and result.nlconstr are pointers, so I meant setting them to NULL (which is also 0), not an arbitrary value.

As for the other points we discussed them in your office and I've now opened #180 to implement the changes.

libprima / prima

C stress test failed #149