jfowkes / pycutest

Python interface to CUTEst
https://jfowkes.github.io/pycutest/
GNU General Public License v3.0
28 stars 11 forks source link

Inconsistency in problem dimensions between description and problem #82

Closed chrhansk closed 5 months ago

chrhansk commented 5 months ago

Describe the bug

There are two ways to determine the dimensions of a CUTest problem using pycutest: The problem properties and the problem itself. In some instances, these ways disagree.

To Reproduce

The following snippet

import pycutest

print("pycutest version:", pycutest.__version__)

name = "HATFLDBNE"

props = pycutest.problem_properties(name)

print(props)

print(f"Props: n = {props['n']}, m = {props['m']}")

problem = pycutest.import_problem(name)

print(problem)
print(f"Problem: n = {problem.n}, m = {problem.m}")

yields the following output:

pycutest version: 1.7.0
{'objective': 'none', 'constraints': 'other', 'regular': True, 'degree': 2, 'origin': 'academic', 'internal': False, 'n': 4, 'm': 0}
Props: n = 4, m = 0
CUTEst problem HATFLDBNE (default params) with 4 variables and 4 constraints
Problem: n = 4, m = 4

As can be seen, there is some ambiguity as to whether the problem has constraints (is m = 0 or m = 4?). It is possible to evaluate the constraints at the provided x0, so I guess m = 4 is correct?

Information about your installation:

jfowkes commented 5 months ago

Thank you very much for the bug report, the issue here is that HATFLDBNE is a nonlinear least-squares problem (as denoted by the NE) and for these CUTEst encodes the residuals as problem constraints: https://jfowkes.github.io/pycutest/_build/html/example.html#nonlinear-least-squares

Thus in some sense both ways of determining the problem dimensions are correct: problem_properties uses the CUTEst classification scheme (https://www.cuter.rl.ac.uk//Problems/classification.shtml) which correctly states that the problem has no general constraints. On the other hand problem.m correctly returns 4 since the size of the residual vector encoded in problem.cons is indeed 4.

However, I agree that this is very confusing and not at all user friendly. I'm not sure there is much we can do about this unfortunately as our hands are tied by the CUTEst implementation. @lindonroberts your thoughts on this?

chrhansk commented 5 months ago

Ok, thank you for the feedback. I was unaware of the new problem class of nonlinear least-squares problems. To make sure I understand correctly, these problems can at most incorporate variables bounds and no hard constraints (since the constraint evaluation is used for residuals), correct?

I did however write a small verification script:

import pycutest

from multiprocessing import cpu_count
from concurrent.futures import ProcessPoolExecutor

names = pycutest.find_problems()

num_procs = cpu_count()
num_probs = 0
num_ok = 0

def examine_instance(name):
    props = pycutest.problem_properties(name)

    prop_n = props['n']
    prop_m = props['m']

    problem = pycutest.import_problem(name, drop_fixed_variables=False)

    prob_n = problem.n
    prob_m = problem.m

    is_lsq = (props["objective"] == "none")

    problem_name = f"LSQ Problem {name}" if is_lsq else f"Problem {name}"

    if prop_n != prob_n and prop_n != "variable":
        print(f"{problem_name}: prob_n = {prob_n} != {prop_n} = prop_n")
        return False

    if props["objective"] == "none":
        if prop_m not in [0, "variable"]:
            print(f"{problem_name}: prop_m = {prop_m} != 0 (prob_m = {prob_m})")
            return False
    else:
        if prop_m != prob_m and prop_m != "variable":
            print(f"{problem_name}: prob_m = {prob_m} != {prop_m} = prop_m")
            return False

    # print(f"{problem_name}: OK")
    return True

with ProcessPoolExecutor(num_procs) as executor:
    futures = [executor.submit(examine_instance, name) for name in names]
    num_probs = len(futures)

    for future in futures:
        if future.result():
            num_ok += 1

print(f"Problems: {num_probs}, OK: {num_ok}")

Out of the 1530 CUTest problems currently available, I see the following possible classification problems:

In terms of the LSQ problems: Sometimes the m-property is set to zero, but other times it is set to the number of residuals. At other times, it has a completely different value all together...

There are also some problems with the normal NLP instances though. So maybe this is also an upstream problem?

jfowkes commented 5 months ago

@chrhansk I have looked more into this and it turns out there is an issue with your script for detecting least squares problems: only CUTEst problems with NE in the problem name are encoded as having residuals in the constraints, the others have a least squares objective but are encoded in obj as normal problems. Apologies I should have made this clearer. The remaining few non-least squares problems that your script flags are in fact correctly classified as the CUTEst classification scheme ignores bound constraints:

The symbol(s) after the third hyphen indicate the number of constraints (other than fixed variables and bounds) in the problem

For example, the PORTSNQP problem has:

There is a single equality constraint of the form sum(i=1,n) x_i = 1 Finally, there are simple bounds 0 <= x_i (i=1,n)

So once again our hands are rather tied by decisions from upstream. Although perhaps we need a better way of documenting the two different types of least-squares problem?

chrhansk commented 5 months ago

Ok, I am still trying to understand the format. I think there are still some inconsistencies regarding the NE problems:

For instance, JUDGENE has 20 constraints set in its properties (equal to the number of problem constraints), whereas the instance HATFLDBNE mentioned above has 0 (and 4 problem constraints).

jfowkes commented 5 months ago

Ah yes you are right, there are definitely a few inconsistent NE problems. Could you create an issue upstream on either the SIFDecode or CUTEst repositories about this? Namely the inconsistency between the number of constraints specified for NE problems in the CUTEst classification string and the number of constraints in the decoded problem.

jfowkes commented 5 months ago

Closing as these inconsistencies are being resolved upstream.