tBuLi / symfit

Symbolic Fitting; fitting as it should be.
http://symfit.readthedocs.org
MIT License
233 stars 17 forks source link

Improve test_finite_difference and test_fit_result #289

Closed antonykamp closed 1 year ago

antonykamp commented 4 years ago

Hey guys,

I worked on _test_finitedifference and _test_fitresult so that they correspond more to the pytest style.

Some words to 'test_fit_result.py': It's not possible to use a fixture in Parameters, that's why I used a dict to store an call the results. I didn't touch the last test, because the converted version would be more confusing than helpful.

Merry Christmas and a Happy New Year

antonykamp commented 4 years ago

The test test_constrained/test_constrained_dependent_on_matrixmodel didn't fail because of this PR. If I clone the master branch, the test doesn't work either.

antonykamp commented 4 years ago

I could compress the fixtures to three: parameters a, b and the result_dict. In this case, all the results would be built in a single fixture, which returns a dict with all of them. The parameters a and b are used the test-method test_fitting.

By the way, I inquire about Fixtures with several return-items.

antonykamp commented 4 years ago

I stick to the solution with the dictionary after long research :D

pckroon commented 4 years ago

I stick to the solution with the dictionary after long research :D

Could you share some of your considerations? That way we can have a short discussion about it here, without me also having to do the research.

antonykamp commented 4 years ago

Just for clearness: You want to create a parametrized fixture. The params of the fixture are the different results of the models. Further, the fixture will be applied to different tests. But now it is difficult to create tests, with each model having a different comparison partner. In this case, there should be a new parallel parameterization. Besides, models often have to be left out (or only one model is tested) and in my eyes, you quickly lose the overview. The tests are not only made for TravisCI, but also for people who need to understand :D

In my eyes, the dictionary allows models to be added speedy, tests to be implemented easily and clearly, and bugs to be viewed quickly.

pckroon commented 4 years ago

Very clear, thanks for the write-up. To sum up: the tests are too diverse? Feel free to reorganise/change the tests so that they become more uniform, if you think that makes sense.

As for the tests in finite_difference, we'll need to find a way to use models in pytest.mark.parametrize without making a million different fixtures. (e.g. @pytest.mark.parametrize('model', [modelA, modelB, modelC, ...])). I don't quite know how to do this yet, and I'm hoping you have a good idea :) We'll need it for the rest of the tests anyway.

antonykamp commented 4 years ago

On the one hand it's difficult to uniform the tests. For example, only two results are using constraints. We have to separate them. Also chained_resultsteps often out of line.

On the other hand the tests can be more uniformed than I initially thought. In some cases, a model didn't mention even though it would be passed.

EDIT: I thought a long time, that it's not possible to put fixtures in a parametrizations, because it was a feature request some years ago, but It didn't realise that an offshoot of Pytest can do it. Now I would have with pytest-cases some new possibilities, for example to replace the dict. However, it is not possible with pure pytest...

pckroon commented 4 years ago

I never said it would be easy :) For constraints: you might be able to parametrize them as [a==b] and use [] for no constraints. Or just None. ChainedMinimizers and their results are a bit magic, so I can see those becoming a separate case. And let's stick to pure pytest for now and see what we can cook up. It may be an idea to create an alphabet of parameters (a-e) / variables (x-z) in the global namespace, and use those to make models for parametrized tests?

antonykamp commented 4 years ago

It's never easy ;-) Some ideas...

assert isinstance(result_name.minimizer, result_minimizer)
if isinstance(result_name.minimizer, ChainedMinimizer):
    for minimizer, cls is zip(result_name.minimizer.minimizers, [BFGS, NelderMead]):
        assert isinstance(minimizer, cls)

(Maybe we could connect the assertstatement with the ifstatement)

pckroon commented 4 years ago
  • I saw that if a model was initiated with None constraints than it gets an empty list. We could use it, because the test, which checks the constraints, uses a for-each loop.

Sure. You could even do a if constraints: ..., which will be False for both None and an empty list.

  • I just want to know if I understood it right: models with ChainedMinimizer use LeastSquares as objective, right?

Not necessarily. Maybe in the current tests, but it's not generally true.

  • It's not beautiful, but we could test the minimizers like this without leaving chained_result out.

I then prefer kicking all chained minimizer tests to either a separate test, or even a separate file. As you said earlier: making the tests understandable is at least as important.

antonykamp commented 4 years ago

Allright....Then I have ean idea of the new test_fit_result file :)

But I won't work on it until next year :P

Guten Rutsch :)

antonykamp commented 4 years ago

Hey, I'm so sorry I was away for so long. Papers, courses and exams stood between the pull request and me. Now I have no such big projects until October. Perhaps this conversion will be done by then (at least that's my goal).

Does it make sense to continue working on this PR, or have there been many new tests in the meantime?

pckroon commented 4 years ago

Heyhey,

it's been quiet here as well. If you want to pick this up again then by all means go ahead. Just to make sure, merge master first, but I don't expect any changes.

antonykamp commented 4 years ago

Bevor I start, I want to sum up the requested changes:

test_finite_difference Parametrizing fixtures should be applied to the method test_model. The problem was the number of required fixtures for the variables and parameters. A solution could be an alphabet in conf.py.

test_fit_result The goal is one big method were the six models will be tested. Problems were the missing feature by pytest and diversity of the models.

So far correct?

pckroon commented 4 years ago

I think so, yes. Start with the simpler one (test_finite_difference) before diving into the other. Maybe something like this would work:

@pytest.mark.parametrize("model, expected_values, initial_values, xdata, noise", (
    [a*x**2 + b*x + c, {'a': 1, 'b': 2, 'c': 3}, {'a': 0, 'b': 0, 'c': 0}, {'x': np.arange(10)}, 0.05]

)
def test_something(model, expected_values, initial_values, xdata, noise):
    if not initial_values:
        initial_values = expected_values.copy()
    y_data = model(**xdata, **expected_values)
    y_data *= np.random.normal(loc=1, scale=noise, shape=y_data.shape)  # doesn't actually work, but you get the idea
    ... # more things here
    assert result == expected_values
antonykamp commented 4 years ago

Alright! The next commit will contain the mentioned collection conf.py of variables and parameters + the integration into the individual test files.

pckroon commented 4 years ago

I'm still contemplating whether we can condense the rest of the tests in finite_difference into the one parameterised case as well, and whether or not that's a good idea. It'll result in a test method with a lot of arguments...

antonykamp commented 4 years ago

In my opinion, test_ODE_stdev could be built into the big method. But the other methods test many more aspects than we can build them in.

Edit: Question: Why is the test that checks, if ODE models generate standard deviation, in test_finite_difference and not in test_ode? Otherwise, we could move this part to test_ode and additionally add an ODE_model to the big parameterization 🙆‍♂️

antonykamp commented 4 years ago

By the way, I found an unconverted test. Should I convert it in this PR (after fit_result and finite_difference) or later?

pckroon commented 4 years ago

Edit: Question: Why is the test that checks, if ODE models generate standard deviation, in test_finite_difference and not in test_ode? Otherwise, we could move this part to test_ode and additionally add an ODE_model to the big parameterization 🙆‍♂️

The original reason is that ODE models require the finite difference methods to be able to calculate a std deviation (because they're not analytically differentiable). I'm ok with reorganising this though, but it would be good to test that we can calculate the Jacobian of an ODEModel, and possibly the stddev.

By the way, I found an unconverted test. Should I convert it in this PR (after fit_result and finite_difference) or later?

Leave it for later. Keep PRs small.

antonykamp commented 4 years ago

Leave it for later. Keep PRs small.

👍

it would be good to test that we can calculate the Jacobian of an ODEModel, and possibly the stddev.

I added an ODEModel to the parameterization. Checking stddev would require to evaluate fit_result of every model. Then we again have the problem that we can hardly separate the datat-arrays from the parameters. As long as we don't have a nicer variant for the arrays, we can hardly test stddev.

antonykamp commented 4 years ago

I'm struggling with the failed check by TravisCI. The test passes locally, but not in CI. In case of c++ I would have put it on the compiler, but in python?

pckroon commented 4 years ago

(sf.Model({y: 3 * a * x**2 + b * x * w - c, z: sf.exp(a*x - b) + c*w}), {'a': 3.5, 'b': 2, 'c': 5, 'w': np.arange(10)/10}), is failing. It looks like a numerical issue: https://travis-ci.org/github/tBuLi/symfit/jobs/702160446#L380 You can either change the values of the parameters/variables to be in a region that is numerically more stable, or relax the tolerance a bit.

antonykamp commented 4 years ago

I am currently working on test_fit_result. I noticed some small things:

  1. The results (parameters a, b + r²) differ strongly If I calculate them with a normal Fit, a Fit with likelihood, or constraints (surprise). I could calculate them with WolframAlpha (or other) and implement the solutions (?)

  2. likelihood has no r^2 understandably. I would check that with hasattr in the test:

    if hasattr(fit_result, 'r_squared'):
    assert isinstance(fit_result.r_squared, float)
    ...
  3. I had asked before if ChainedMinimizer always uses LeastSquare as a minimizer. Even if it is not by default, should I still include it in the test?

  4. Currently we only test fit_result, minpack_result and likelihood for the existence of r^2, objective_result,... I would add the remaining Fits so that everything fits into one test.

pckroon commented 4 years ago

I expect test_fit_results to require much more invasive/drastic changes than test_finite_difference.

  1. I think this is expected (since they're different problems), but which test are you referring to exactly?
  2. See also 4.
  3. ChainedMinimizer is just another minimizer, and can be used in combination with any other Objective (such as LeastSquares). ChainedMinimizers are a little funny though, since it runs multiple minimizers in order which affects the produced fit_result.
  4. Great idea. Combined with 2, maybe you can parametrize the tests with a dict of expected attribute/value pairs, and use getattr (for attr_name, value in attributes.items(): assert getattr(attr_name) == pytest.approx(value))
antonykamp commented 4 years ago

I uploaded the first draft to talk with examples (with described failures and without comments) :D

  1. Because we use the constraint Eq(a, b) in two Fits the results don't fit perfectly to the datasets. And likelihood is likelihood :P
  2. & 4. A dict for the fit results would be helpful too I guess. But I like the idea of getattr!
  3. So ChainedMinimizer is the special kid in the family of Minimizers ;)

In every case, we should calculate the results with WolframAlpha to get a, b,r^2,....

tBuLi commented 4 years ago

Hi guys, thanks for keeping up the good work, I've been a bit swamped with work regarding the last stages of my PhD so I've struggled to find time to look into this. I trust @pckroon's verdict on this.

Be sure to remove the German comments from the code before merging though, das ist ja wahnsinn! :P

p.s. I'm loving the reduction in lines of code, is this achieved without loss of functionality?

pckroon commented 4 years ago

So ChainedMinimizer is the special kid in the family of Minimizers ;)

Definitely.

In every case, we should calculate the results with WolframAlpha to get a, b,r^2,....

The values as they are in the tests now should be correct/validated somehow. But feel free to double check them.

pckroon commented 4 years ago

p.s. I'm loving the reduction in lines of code, is this achieved without loss of functionality?

Jup. That's the magic of pytest parametrization and one of the reasons why I've been pushing for this. In addition, it helps check that you cover all combinations.

antonykamp commented 4 years ago

Be sure to remove the German comments from the code before merging though, das ist ja wahnsinn! :P

Of course :D I pushed a draft including my todos :)

antonykamp commented 4 years ago

The values as they are in the tests now should be correct/validated somehow. But feel free to double check them.

I'll add the "getattr-idea" with the double-check results 🦾

antonykamp commented 3 years ago

Thank you! I hope you arrived safely in 2021 :) Because you are super busy (🦸 ) I get straight to the talking points. But I don't expect you to answer quickly (I'm not in a hurry ;)

test_no_constraints

  1. You were right about LBFGSB. It has only a bit-string as status_message. I would recommend inserting an if-else block here checking whether the instance of the minimizer equals LBFGSB (+ comments)
  2. We receive pretty wrong results with TrustConstr & COBYLA as minimizers. Probably because both were developed for constrained fitting. Removing both from this test could be an idea(?)
  3. We receive wrong results with LogLikelihood as objective. In this case, we should wait until the even busier @tBuLi can help us :)

test_no_constraints

  1. The fitting with TrustConstr as minimizer fails, because TrustConstr doesn't have a 'hess'-object. This object is necessary if we want to use a CallableNumericalModel as a constraint (#305). We could remove the CallableNumericalModel from the constraints or test TrustConstr separately without it.
  2. Same problem with LogLikelihood as objective as intest_no_constraints.

greetz

pckroon commented 3 years ago
  1. You were right about LBFGSB. It has only a bit-string as status_message. I would recommend inserting an if-else block here checking whether the instance of the minimizer equals LBFGSB (+ comments)

I would prefer fixing it in FitResult itself (https://github.com/DuckNrOne/symfit/blob/master/symfit/core/fit_results.py#L42)

  1. We receive pretty wrong results with TrustConstr & COBYLA as minimizers. Probably because both were developed for constrained fitting. Removing both from this test could be an idea(?)

This is worrying/strange/weird to me. TrustConstr is advertised in scipy as the most flexible and powerful method available. Personally I've never had good results with COBYLA. Maybe they should both be moved to constrained problems only. I'll leave that up to you. Unrelated to this PR, it would also be good if I/we/you/someone would find time to wrap the remaining scipy methods.

  1. We receive wrong results with LogLikelihood as objective. In this case, we should wait until the even busier @tBuLi can help us :)

I don't really understand what LL fitting does. As far as I know, it answers the question: given this collection of points, what distribution (with these free parameters) were they most likely drawn from? So draw a bunch of points from one of the np random distributions: https://numpy.org/doc/stable/reference/random/generator.html#distributions with set parameters, and use LL to try and recover the parameters.

  1. The fitting with TrustConstr as minimizer fails, because TrustConstr doesn't have a 'hess'-object. This object is necessary if we want to use a CallableNumericalModel as a constraint (#305). We could remove the CallableNumericalModel from the constraints or test TrustConstr separately without it.

Indeed for TC the constraints also need to have hessians, which the CallableNumericalModels don't have. So removing the combination TC + CNM is the correct answer.

tBuLi commented 3 years ago

Hey @DuckNrOne, sorry for the long silence! Thanks for continuing the good work! Can you tell me where I can find this test_no_constraints function you are referring to? I can't find it right now. Then I'll let you know if I can think of a reason why LL might behave differently :).

antonykamp commented 3 years ago

I removed TC and COBYLA from no_constraints for the moment. I will open an issue about wrapping the remaining scipy methods in the next days.

About removing TC + CNM: Pytest implemented skipif to exclude a parametrized testcases under defined circumstances. But we can't parametrize one variable (eg constraints) and exclude the combination TC + CMN, because our "circumstances" refer to another parametrize variable (eg minimizer). But we could parametrize a tuple of a minimizer and constraint and exclude the tuple TC, CMN. In this case, I would separate the definition of the constraints and the parametrization like this:

small_constraint = [Le(a,b)]
big_constraint = [
        Le(a, b),
        CallableNumericalModel.as_constraint(
            {z: ge_constraint}, connectivity_mapping={z: {a}},
            constraint_type=Ge, model=Model({y: a * x ** b})
        )
    ]

@pytest.mark.parametrize('minimizer, constraints',
    # Added None to check the defaul objective
    # Removed Truple with TrustConstr, because it cannot handle CallableNumericalModel
    [(min, constr) for (min, constr) in itertools.product(list(subclasses(ConstrainedMinimizer) | {None}), [small_constraint, big_constraint]) \
        if (min,constr) != (TrustConstr, big_constraint)]
    )
...

Another idea is a separate test for TrustConstr with constraints :D

Concerning LL: ge_constraints equals the function in the old unittests. 🙆‍♂️

pckroon commented 3 years ago

Heyhey!

Another idea is a separate test for TrustConstr with constraints :D

I don't really like the code snippet you suggest, since it puts a lot of logic in the "wrong" place. Separating TC + constraints is almost the correct idea I think: it's not TC+constraints, or even TC+CNM that's the exception, but CallableNumericalModel constraints are the exception. In summary, test 1: all constrained minimizers with symbolic constraints; test 2: all constrained minimizers except TC with CNM constraints.

You may want to create a separate helper function that performs the actual test.

antonykamp commented 3 years ago

I would prefer fixing it in FitResult itself (https://github.com/DuckNrOne/symfit/blob/master/symfit/core/fit_results.py#L42)

Should I open another issue because of LBFGSB (next to the issue with TrustConst and COBYLA) ?

pckroon commented 3 years ago

I would prefer fixing it in FitResult itself (https://github.com/DuckNrOne/symfit/blob/master/symfit/core/fit_results.py#L42)

Should I open another issue because of LBFGSB (next to the issue with TrustConst and COBYLA) ?

Sounds good to me.

antonykamp commented 3 years ago

For all tests you have expected_got, did you mean expected_gof(goodness of fit)?

It's propably a typo :) sorry for that! :D

Do we also want to test with objective=None? Or is that more for testing the Fit object, i.e. another test file/PR

I think I've added None, because I wanted to select the "default" objective.

antonykamp commented 3 years ago

Experiment and pick your favourite I'd say.

I decided to add an if-clause to the test-methods because we did the same thing with TrustConstr. Now the only thing left to fix are the LogLikelihood-test-cases and the status_message of LBFGSB-test-cases (#325).

antonykamp commented 3 years ago

Thank you for the review (again) 🚀 Regarding the constraints, I would say that we only edit CNM_CONSTRAINT in that way, that the outcome isn't the same as without it. But edit in which way? 🤔

pckroon commented 3 years ago

Regarding the constraints, I would say that we only edit CNM_CONSTRAINT in that way, that the outcome isn't the same as without it. But edit in which way? 🤔

By using a different Model. For example, if you're not fitting but just minimizing a function, that function can have multiple (local) minima. The combination of initial guesses and constraints will decide which one you'll find. When fitting functions it's a bit harder to see, but I think we can achieve the desired effect by taking a model with degenerate/redundant parameters (but I'm not completely sure here). For example: {z: (a+b+c)*x. That would make all combinations where a+b+c = d have the same outcome. I think we can use this example to have constraints that cause, e.g. a < b < c. We may need to tweak the test a bit in case we need to change the initial guesses or bounds. If it gets too complicated, leave it out of this PR.

antonykamp commented 3 years ago

Thank you for the explanation! I'll look what I can do with this new information :)

antonykamp commented 3 years ago

Alright :D It doesn't matter what I try the results don't change :)) I'm just not in the topic :) sooorry :)

pckroon commented 3 years ago

No problem. Could you rebase your branch on master? I can then take a look at the LL testcases, and maybe the constrained ones as well.

antonykamp commented 3 years ago

Thank you very much for doing this! :) Just as a note: I've excluded TrustConstr because the LL test case didn't stop with 1e8 iterations. :)

antonykamp commented 1 year ago

Closed because it's old :)