Inconsistency in problem dimensions between description and problem

chrhansk commented 5 months ago

Describe the bug

There are two ways to determine the dimensions of a CUTest problem using pycutest: The problem properties and the problem itself. In some instances, these ways disagree.

To Reproduce

The following snippet

import pycutest

print("pycutest version:", pycutest.__version__)

name = "HATFLDBNE"

props = pycutest.problem_properties(name)

print(props)

print(f"Props: n = {props['n']}, m = {props['m']}")

problem = pycutest.import_problem(name)

print(problem)
print(f"Problem: n = {problem.n}, m = {problem.m}")

yields the following output:

pycutest version: 1.7.0
{'objective': 'none', 'constraints': 'other', 'regular': True, 'degree': 2, 'origin': 'academic', 'internal': False, 'n': 4, 'm': 0}
Props: n = 4, m = 0
CUTEst problem HATFLDBNE (default params) with 4 variables and 4 constraints
Problem: n = 4, m = 4

As can be seen, there is some ambiguity as to whether the problem has constraints (is m = 0 or m = 4?). It is possible to evaluate the constraints at the provided x0, so I guess m = 4 is correct?

Information about your installation:

Linux
Python 3.12.3
PyCUTEst Version: 1.7.0

jfowkes commented 5 months ago

Thank you very much for the bug report, the issue here is that HATFLDBNE is a nonlinear least-squares problem (as denoted by the NE) and for these CUTEst encodes the residuals as problem constraints: https://jfowkes.github.io/pycutest/_build/html/example.html#nonlinear-least-squares

Thus in some sense both ways of determining the problem dimensions are correct: problem_properties uses the CUTEst classification scheme (https://www.cuter.rl.ac.uk//Problems/classification.shtml) which correctly states that the problem has no general constraints. On the other hand problem.m correctly returns 4 since the size of the residual vector encoded in problem.cons is indeed 4.

However, I agree that this is very confusing and not at all user friendly. I'm not sure there is much we can do about this unfortunately as our hands are tied by the CUTEst implementation. @lindonroberts your thoughts on this?

chrhansk commented 5 months ago

Ok, thank you for the feedback. I was unaware of the new problem class of nonlinear least-squares problems. To make sure I understand correctly, these problems can at most incorporate variables bounds and no hard constraints (since the constraint evaluation is used for residuals), correct?

I did however write a small verification script:

import pycutest

from multiprocessing import cpu_count
from concurrent.futures import ProcessPoolExecutor

names = pycutest.find_problems()

num_procs = cpu_count()
num_probs = 0
num_ok = 0

def examine_instance(name):
    props = pycutest.problem_properties(name)

    prop_n = props['n']
    prop_m = props['m']

    problem = pycutest.import_problem(name, drop_fixed_variables=False)

    prob_n = problem.n
    prob_m = problem.m

    is_lsq = (props["objective"] == "none")

    problem_name = f"LSQ Problem {name}" if is_lsq else f"Problem {name}"

    if prop_n != prob_n and prop_n != "variable":
        print(f"{problem_name}: prob_n = {prob_n} != {prop_n} = prop_n")
        return False

    if props["objective"] == "none":
        if prop_m not in [0, "variable"]:
            print(f"{problem_name}: prop_m = {prop_m} != 0 (prob_m = {prob_m})")
            return False
    else:
        if prop_m != prob_m and prop_m != "variable":
            print(f"{problem_name}: prob_m = {prob_m} != {prop_m} = prop_m")
            return False

    # print(f"{problem_name}: OK")
    return True

with ProcessPoolExecutor(num_procs) as executor:
    futures = [executor.submit(examine_instance, name) for name in names]
    num_probs = len(futures)

    for future in futures:
        if future.result():
            num_ok += 1

print(f"Problems: {num_probs}, OK: {num_ok}")

Out of the 1530 CUTest problems currently available, I see the following possible classification problems:

LSQ Problem SANTA: prop_m = 23 != 0 (prob_m = 23)
LSQ Problem KOWOSBNE: prop_m = 11 != 0 (prob_m = 11)
LSQ Problem JUDGENE: prop_m = 20 != 0 (prob_m = 20)
LSQ Problem EGGCRATENE: prob_n = 2 != 4 = prop_n
LSQ Problem METHANL8: prop_m = 31 != 0 (prob_m = 31)
LSQ Problem PFIT1: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem HIMMELBC: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem RAT42: prop_m = 9 != 0 (prob_m = 9)
LSQ Problem BA-L1: prop_m = 12 != 0 (prob_m = 12)
Problem ANTWERP: prob_m = 10 != 8 = prop_m
LSQ Problem DENSCHNENE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem CERI651D: prop_m = 67 != 0 (prob_m = 67)
LSQ Problem HATFLDF: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem LEVYMONE5: prop_m = 4 != 0 (prob_m = 4)
LSQ Problem GAUSS3: prop_m = 250 != 0 (prob_m = 250)
LSQ Problem LSC1: prop_m = 6 != 0 (prob_m = 6)
LSQ Problem MISRA1A: prop_m = 14 != 0 (prob_m = 14)
LSQ Problem DENSCHNBNE: prop_m = 2 != 0 (prob_m = 3)
LSQ Problem LANCZOS3: prop_m = 24 != 0 (prob_m = 24)
LSQ Problem BIGGS6NE: prop_m = 13 != 0 (prob_m = 13)
LSQ Problem TRIGGER: prop_m = 6 != 0 (prob_m = 6)
LSQ Problem BOOTH: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem BOXBOD: prop_m = 6 != 0 (prob_m = 6)
LSQ Problem CHWIRUT2: prop_m = 54 != 0 (prob_m = 54)
LSQ Problem GAUSS1: prop_m = 250 != 0 (prob_m = 250)
LSQ Problem GOTTFR: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem RES: prop_m = 14 != 0 (prob_m = 14)
LSQ Problem COOLHANS: prop_m = 9 != 0 (prob_m = 9)
LSQ Problem VESUVIOU: prop_m = 1025 != 0 (prob_m = 1025)
LSQ Problem LSC2: prop_m = 6 != 0 (prob_m = 6)
Problem MPC12: prob_n = 1530 != 1529 = prop_n
LSQ Problem ELATVIDUNE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem MGH17S: prop_m = 33 != 0 (prob_m = 33)
LSQ Problem HYDCAR6: prop_m = 29 != 0 (prob_m = 29)
LSQ Problem VIBRBEAMNE: prop_m = 30 != 0 (prob_m = 30)
LSQ Problem S308NE: prop_m = 3 != 0 (prob_m = 3)
Problem DECONVU: prob_n = 63 != 61 = prop_n
LSQ Problem VANDANIUMS: prop_m = 10 != 0 (prob_m = 10)
Problem MPC1: prob_n = 2550 != 2549 = prop_n
LSQ Problem DENSCHNDNE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem LEVYMONE6: prop_m = 6 != 0 (prob_m = 6)
LSQ Problem ARGAUSS: prop_m = 15 != 0 (prob_m = 15)
LSQ Problem DECONVNE: prob_n = 63 != 61 = prop_n
LSQ Problem MISRA1D: prop_m = 14 != 0 (prob_m = 14)
LSQ Problem CERI651E: prop_m = 64 != 0 (prob_m = 64)
LSQ Problem LEVYMONE9: prop_m = 16 != 0 (prob_m = 16)
LSQ Problem HAHN1: prop_m = 37 != 0 (prob_m = 236)
LSQ Problem NYSTROM5C: prop_m = 20 != 0 (prob_m = 20)
LSQ Problem POWELLSQ: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem EXP2NE: prop_m = 10 != 0 (prob_m = 10)
LSQ Problem DENSCHNCNE: prop_m = 2 != 0 (prob_m = 2)
Problem CMPC15: prob_n = 1530 != 1529 = prop_n
LSQ Problem HYDCAR20: prop_m = 99 != 0 (prob_m = 99)
Problem TABLE7: prob_m = 230 != 17 = prop_m
Problem DECONVC: prob_n = 63 != 61 = prop_n
Problem MODEL: prob_n = 1542 != 1557 = prop_n
LSQ Problem DEVGLA1NE: prop_m = 24 != 0 (prob_m = 24)
Problem CMPC13: prob_n = 1530 != 1529 = prop_n
LSQ Problem MGH10: prop_m = 16 != 0 (prob_m = 16)
Problem MPC9: prob_n = 1530 != 1529 = prop_n
LSQ Problem PFIT3: prop_m = 3 != 0 (prob_m = 3)
Problem DECONVB: prob_n = 63 != 61 = prop_n
LSQ Problem NELSON: prop_m = 128 != 0 (prob_m = 128)
LSQ Problem KIRBY2: prop_m = 151 != 0 (prob_m = 151)
LSQ Problem HIMMELBE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem HIMMELBFNE: prop_m = 7 != 0 (prob_m = 7)
Problem MPC2: prob_n = 1530 != 1529 = prop_n
Problem CMPC1: prob_n = 2550 != 2549 = prop_n
LSQ Problem BA-L1SP: prop_m = 12 != 0 (prob_m = 12)
Problem TWIRIMD1: prob_m = 712 != 544 = prop_m
Problem CLEUVEN7: prob_n = 360 != 300 = prop_n
Problem CMPC6: prob_n = 1530 != 1529 = prop_n
Problem CMPC16: prob_n = 1530 != 1529 = prop_n
LSQ Problem LEVYMONE10: prop_m = 20 != 0 (prob_m = 20)
Problem MPC8: prob_n = 1530 != 1529 = prop_n
LSQ Problem HATFLDG: prop_m = 25 != 0 (prob_m = 25)
LSQ Problem WAYSEA1NE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem MGH17SLS: prop_m = 33 != 0 (prob_m = 0)
LSQ Problem ROSZMAN1: prop_m = 25 != 0 (prob_m = 25)
LSQ Problem DANWOOD: prop_m = 6 != 0 (prob_m = 6)
LSQ Problem THURBER: prop_m = 37 != 0 (prob_m = 37)
LSQ Problem LEVYMONE8: prop_m = 10 != 0 (prob_m = 10)
LSQ Problem POWELLBS: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem MGH10S: prop_m = 16 != 0 (prob_m = 16)
LSQ Problem HIMMELBA: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem NYSTROM5: prop_m = 20 != 0 (prob_m = 20)
LSQ Problem CERI651A: prop_m = 61 != 0 (prob_m = 61)
LSQ Problem HIMMELBD: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem CERI651B: prop_m = 66 != 0 (prob_m = 66)
LSQ Problem BARDNE: prop_m = 15 != 0 (prob_m = 15)
LSQ Problem VESUVIO: prop_m = 1025 != 0 (prob_m = 1025)
Problem MPC15: prob_n = 1530 != 1529 = prop_n
LSQ Problem OSBORNE2: prop_m = 65 != 0 (prob_m = 65)
LSQ Problem DECONVBNE: prob_n = 63 != 61 = prop_n
LSQ Problem HYPCIR: prop_m = 2 != 0 (prob_m = 2)
Problem MPC3: prob_n = 1530 != 1529 = prop_n
LSQ Problem BEALENE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem ZANGWIL3: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem HEART8: prop_m = 8 != 0 (prob_m = 8)
LSQ Problem SINVALNE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem WATSONNE: prop_m = 31 != 0 (prob_m = 31)
Problem MPC13: prob_n = 1530 != 1529 = prop_n
LSQ Problem BOX3NE: prop_m = 10 != 0 (prob_m = 10)
LSQ Problem PRICE3NE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem PFIT4: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem ENGVAL2NE: prop_m = 3 != 0 (prob_m = 5)
Problem EGGCRATE: prob_n = 2 != 4 = prop_n
LSQ Problem DENSCHNFNE: prop_m = 2 != 0 (prob_m = 2)
Problem LEUVEN7: prob_n = 360 != 300 = prop_n
LSQ Problem METHANB8: prop_m = 31 != 0 (prob_m = 31)
LSQ Problem RSNBRNE: prop_m = 2 != 0 (prob_m = 2)
Problem CMPC9: prob_n = 1530 != 1529 = prop_n
Problem MPC6: prob_n = 1530 != 1529 = prop_n
LSQ Problem PRICE4NE: prop_m = 2 != 0 (prob_m = 2)
Problem EGGCRATEB: prob_n = 2 != 4 = prop_n
LSQ Problem DANIWOOD: prop_m = 6 != 0 (prob_m = 6)
Problem SYNPOP24: prob_n = 6968 != 16080 = prop_n
LSQ Problem KTMODEL: prob_n = 726 != 720 = prop_n
LSQ Problem MISRA1B: prop_m = 14 != 0 (prob_m = 14)
LSQ Problem BROWNBSNE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem EXPFITNE: prop_m = 2 != 0 (prob_m = 10)
LSQ Problem CUBENE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem WAYSEA2NE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem GROWTH: prop_m = 12 != 0 (prob_m = 12)
LSQ Problem HATFLDFLNE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem AIRCRFTA: prop_m = 5 != 0 (prob_m = 5)
LSQ Problem HELIXNE: prop_m = 3 != 0 (prob_m = 3)
Problem CMPC3: prob_n = 1530 != 1529 = prop_n
LSQ Problem MGH09: prop_m = 11 != 0 (prob_m = 11)
LSQ Problem JENSMPNE: prop_m = 10 != 0 (prob_m = 10)
LSQ Problem LEVYMONE7: prop_m = 8 != 0 (prob_m = 8)
Problem NELSONLS: prob_m = 0 != None = prop_m
LSQ Problem FBRAIN: prop_m = 2211 != 0 (prob_m = 2211)
LSQ Problem MEYER3NE: prop_m = 16 != 0 (prob_m = 16)
LSQ Problem ECKERLE4: prop_m = 35 != 0 (prob_m = 35)
LSQ Problem ENSO: prop_m = 168 != 0 (prob_m = 168)
LSQ Problem HEART6: prop_m = 6 != 0 (prob_m = 6)
Problem MPC5: prob_n = 1530 != 1529 = prop_n
LSQ Problem HATFLDDNE: prop_m = 10 != 0 (prob_m = 10)
LSQ Problem MGH17: prop_m = 33 != 0 (prob_m = 33)
LSQ Problem LANCZOS2: prop_m = 24 != 0 (prob_m = 24)
LSQ Problem FBRAIN2: prop_m = 2211 != 0 (prob_m = 2211)
LSQ Problem STREGNE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem LANCZOS1: prop_m = 24 != 0 (prob_m = 24)
LSQ Problem MUONSINE: prop_m = 512 != 0 (prob_m = 512)
LSQ Problem VESUVIA: prop_m = 1025 != 0 (prob_m = 1025)
Problem CMPC2: prob_n = 1530 != 1529 = prop_n
LSQ Problem SSINE: prop_m = 2 != 0 (prob_m = 2)
LSQ Problem CLUSTER: prop_m = 2 != 0 (prob_m = 2)
Problem CMPC8: prob_n = 1530 != 1529 = prop_n
LSQ Problem HATFLDENE: prop_m = 21 != 0 (prob_m = 21)
LSQ Problem PFIT2: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem COATINGNE: prop_m = 252 != 0 (prob_m = 252)
LSQ Problem FBRAIN2NE: prop_m = 2211 != 0 (prob_m = 2211)
Problem MPC16: prob_n = 1530 != 1529 = prop_n
LSQ Problem YFITNE: prop_m = 16 != 0 (prob_m = 17)
Problem SYNTHES3: prob_m = 23 != 19 = prop_m
LSQ Problem CERI651C: prop_m = 56 != 0 (prob_m = 56)
LSQ Problem FBRAINNE: prop_m = 2211 != 0 (prob_m = 2211)
Problem BATCH: prob_n = 48 != 46 = prop_n
LSQ Problem RAT43: prop_m = 15 != 0 (prob_m = 15)
LSQ Problem CHWIRUT1: prop_m = 214 != 0 (prob_m = 214)
LSQ Problem GAUSS2: prop_m = 250 != 0 (prob_m = 250)
Problem CMPC5: prob_n = 1530 != 1529 = prop_n
Problem CMPC12: prob_n = 1530 != 1529 = prop_n
LSQ Problem OSBORNE1: prop_m = 33 != 0 (prob_m = 33)
LSQ Problem BENNETT5: prop_m = 154 != 0 (prob_m = 154)
LSQ Problem RECIPE: prop_m = 3 != 0 (prob_m = 3)
LSQ Problem DEVGLA2NE: prop_m = 16 != 0 (prob_m = 16)
LSQ Problem MISRA1C: prop_m = 14 != 0 (prob_m = 14)
LSQ Problem BROWNDENE: prop_m = 20 != 0 (prob_m = 20)
LSQ Problem FBRAIN3: prop_m = 2211 != 0 (prob_m = 2211)
LSQ Problem GBRAIN: prop_m = 2200 != 0 (prob_m = 2200)
Problem PORTSNQP: prob_m = 2 != 1 = prop_m
LSQ Problem DIAMON2D: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DMN15332: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DMN37142: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DMN15102: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem BA-L49: prop_m = 31843 != 0 (prob_m = 63686)
LSQ Problem DMN15333: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DIAMON3D: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DMN15103: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem DMN37143: prop_m = 4643 != 0 (prob_m = 4643)
LSQ Problem BA-L73: prop_m = 46122 != 0 (prob_m = 92244)
LSQ Problem MNISTS0: prop_m = 6000 != 0 (prob_m = 60000)
LSQ Problem MNISTS5: prop_m = 6000 != 0 (prob_m = 60000)
LSQ Problem BA-L21: prop_m = 36455 != 0 (prob_m = 72910)
LSQ Problem BA-L16: prop_m = 83718 != 0 (prob_m = 167436)
LSQ Problem BA-L52: prop_m = 347173 != 0 (prob_m = 694346)

In terms of the LSQ problems: Sometimes the m-property is set to zero, but other times it is set to the number of residuals. At other times, it has a completely different value all together...

There are also some problems with the normal NLP instances though. So maybe this is also an upstream problem?

jfowkes commented 5 months ago

@chrhansk I have looked more into this and it turns out there is an issue with your script for detecting least squares problems: only CUTEst problems with NE in the problem name are encoded as having residuals in the constraints, the others have a least squares objective but are encoded in obj as normal problems. Apologies I should have made this clearer. The remaining few non-least squares problems that your script flags are in fact correctly classified as the CUTEst classification scheme ignores bound constraints:

The symbol(s) after the third hyphen indicate the number of constraints (other than fixed variables and bounds) in the problem

For example, the PORTSNQP problem has:

There is a single equality constraint of the form sum(i=1,n) x_i = 1 Finally, there are simple bounds 0 <= x_i (i=1,n)

So once again our hands are rather tied by decisions from upstream. Although perhaps we need a better way of documenting the two different types of least-squares problem?

chrhansk commented 5 months ago

Ok, I am still trying to understand the format. I think there are still some inconsistencies regarding the NE problems:

For instance, JUDGENE has 20 constraints set in its properties (equal to the number of problem constraints), whereas the instance HATFLDBNE mentioned above has 0 (and 4 problem constraints).

jfowkes commented 5 months ago

Ah yes you are right, there are definitely a few inconsistent NE problems. Could you create an issue upstream on either the SIFDecode or CUTEst repositories about this? Namely the inconsistency between the number of constraints specified for NE problems in the CUTEst classification string and the number of constraints in the decoded problem.

jfowkes commented 5 months ago

Closing as these inconsistencies are being resolved upstream.

jfowkes / pycutest

Inconsistency in problem dimensions between description and problem #82