Parameterized CVXPY problems

phschiele commented 3 years ago

Hi @robertmartin8, this PR intends to kick off the implementation of the enhancements discussed in #302. As you can see, it is still a work in progress, but I wanted to get your feedback early on. I try to tackle both issues that were discussed, namely discouraging users to add further constraints after the problem has already been solved, and use parametrized CVXPY problems for better performance when solving a problem repeatedly.

The changes are ~intended to be backward compatible, i.e. all I do is raise a warning if one still adds constraints later on.~ not backward compatible. Adding this warning already pointed me to three tests that could be improved.

ToDos:

[x] Implement for EfficientFrontier
[x] Implement for EfficientSemivariance
[x] Implement for EfficientCVaR
[x] Write tests
[x] Make sure all library code uses public functions where possible (i.e. use add_constraint() over self._constraints.append())
[x] Consider reducing code duplication

robertmartin8 commented 3 years ago

This is great! I really like that it uses CVXPY's built-in name functionality so we don't have to maintain a mapping.

One possible concern is code duplication? Though this is a minor issue compared to the performance advantages of parameterised problems.

Speaking of performance, does this make a noticeable difference for the plots?

Looking forward to collaborating on this further once my exams are over 😭

phschiele commented 3 years ago

Code duplication for checking if the parameter is already defined? Yes, perhaps we can improve on that a bit, but I also don't think it's too bad. I was thinking about writing a custom decorator, but we would still want to run the checks defined at the beginning of many functions in the parameter update case.

As for the speed improvements, even with the default test setup, we get a speed improvement of almost 75%. I assume the time improvement would be even greater for more complex optimizations (in terms of the number of assets/constraints/optimizations). One thing that I need to do to get those numbers is adding the following if condition to _solve_cvxpy_opt_problem(), which would break backward compatibility.

if self._opt is None:
    self._opt = cp.Problem(cp.Minimize(self._objective), self._constraints)

Anyways, here are the timings:

%%timeit
ef = setup_efficient_frontier()
plot_efficient_frontier(ef)

parametrized:

>>> 455 ms ± 18.4 ms per loop

non-parametrized:

>>> 1.74 s ± 96.9 ms per loop

Running with points=1000, the improvement is even greater at 85% (3.2 vs 22s). (Note: I turned off the _max_return() check for all runs, as it is not compliant with the new "immutable" setup itself and needs to be adapted first).

robertmartin8 commented 3 years ago

This block has to be reproduced in all problems where there's a parameter right?

 update_existing_parameter = self.is_parameter_defined('target_variance')
  if update_existing_parameter:
      self.update_parameter_value('target_variance', target_volatility ** 2)
  else:
      self._objective = objective_functions.portfolio_return(
          self._w, self.expected_returns
      )
      variance = objective_functions.portfolio_variance(self._w, self.cov_matrix)

      for obj in self._additional_objectives:
          self._objective += obj

      target_variance = cvxpy.Parameter(name="target_variance", value=target_volatility ** 2)
      self.add_constraint(lambda w: variance <= target_variance)
      self._make_weight_sum_constraint(market_neutral)

Thanks for the performance stats. It's a great speedup!

phschiele commented 3 years ago

 update_existing_parameter = self.is_parameter_defined('target_variance')
  if update_existing_parameter:
      self.update_parameter_value('target_variance', target_volatility ** 2)
  else:

Right, these four lines would basically be repeated multiple times.

phschiele commented 3 years ago

I think the important decision in this PR is the following: If we want to take advantage of the speed and robustness improvements suggested in this PR, we need to prevent users from (the unsafe behavior of) modifying _constraints and _objective after the first solve. Otherwise, it would be even more confusing if they update those attributes but it is just not taken into account, as the problem is not redefined again. We could hack around this e.g. by keeping a copy of constraint_ids at every solve and check at subsequent solves if the constraints have changed and if so, do setup a new problem.

The best way would probably be to implement such a hack and issue a DeprecationWarning whenever the hack is triggered. Subsequent versions could then stop supporting this behavior.

robertmartin8 commented 3 years ago

we need to prevent users from (the unsafe behavior of) modifying _constraints and _objective after the first solve.

i.e prevent something like this?

ef = EfficientFrontier(mu, S)
ef.efficient_risk(0.15)

ef.add_constraint(lambda x: x<= 0.5)
ef.efficient_risk(0.15)

and if so, do setup a new problem.

Alternatively, we could just raise an error and ask the user to set up a new object manually?

My philosophy is that each instance of EfficientFrontier should be associated with one optimisation problem. By optimisation problem I mean input data, objectives, set of constraints (but not the parameters in these constraints). So if a user wants to change any of those things, they must define a new instance.

phschiele commented 3 years ago

i.e prevent something like this?

Exactly, but also from modifying _constraints directly, like ef._constraints.append(...), which is obviously harder to detect.

The philosophy you describe is exactly what this PR aims to achieve. So in your view, it would be fine to throw a (hopefully informative) error for such unintended behavior?

phschiele commented 3 years ago

@robertmartin8 I think this PR is getting close to being done. I would appreciate it if you could take another look.

Regarding the code duplication, I thought we could perhaps use a decorator instead (see below on how I imagined it to work). While I think this would be rather elegant, it is blocked by the different checks that are sometimes performed before recalling the solver with updated parameter values (e.g., that the new value is still less than the max return for efficient_return()).

As this PR is anyway getting quite large already, I would suggest keeping the duplications for now, and perhaps tackle them in a follow-up PR.

def parametrize_cvxpy_problem(func):
    def wrapper(*args, **kwargs):
        if isdefined(bound):
            update_parameter()
            self.problem.solve()
        else:
            func(*args, **kwargs)
    return wrapper

class CVXPYObj:
    def __init__(self):
        # set up class

    @parametrize_cvxpy_problem
    def function(self, bound):
        # set up problem
        # ....
        self.problem.solve()

robertmartin8 commented 3 years ago

Hey @phschiele, cheers for all the hard work on this. I'll have a look on Sunday!

robertmartin8 commented 3 years ago

Some misc points I'm considering changing (but please let me know if you disagree):

Typing: I'm not sure whether to prefer "best practice" vs consistency. The little angel on my shoulder is saying that I should leave the typing in and try to gradually start adding type annotations to the rest of the library; the little devil is telling me to take it out and keep this library completely un-typed – as is the python way :)
Rather than raising general Exceptions (e.g here), I'm thinking it might be better to define a new custom exception class. Haven't decided on an informative name, maybe something like InstantiationError idk.
Adding leading underscores to some of the new methods, assuming that we don't want the user to be making calls to e.g get_all_args.

Having looked over the changes, I really couldn't care less about the duplication anymore – it's a very small price to pay for the robustness.

It's not super clear to me why backwards compatibility is violated? Seems like it'd be an edge case right

phschiele commented 3 years ago

Hi @robertmartin8, thanks for the review!

Some misc points I'm considering changing (but please let me know if you disagree):

Typing: I'm not sure whether to prefer "best practice" vs consistency. The little angel on my shoulder is saying that I should leave the typing in and try to gradually start adding type annotations to the rest of the library; the little devil is telling me to take it out and keep this library completely un-typed – as is the python way :)

After some getting used to it, typing has saved me a lot of headaches on larger projects. I'd be happy to contribute to the gradual addition of typing, but feel free to remove it for consistency reasons.

Rather than raising general Exceptions (e.g here), I'm thinking it might be better to define a new custom exception class. Haven't decided on an informative name, maybe something like InstantiationError idk.

Good point, I usually hope that the message is clear enough, but custom exceptions are even better, of course.

Adding leading underscores to some of the new methods, assuming that we don't want the user to be making calls to e.g get_all_args.

Yes, I think that makes total sense.

Having looked over the changes, I really couldn't care less about the duplication anymore – it's a very small price to pay for the robustness.

It's not super clear to me why backwards compatibility is violated? Seems like it'd be an edge case right

You're right, it basically only breaks the invalid use cases.

robertmartin8 commented 3 years ago

After some getting used to it, typing has saved me a lot of headaches on larger projects.

Any aspects in particular? It seems like it might make the library more "self-documenting" (since we could remove type annotations from the docstrings presumably) but I'm worried about the following:

This library uses a lot of ambiguous types, e.g you can provide either a dataframe or a numpy array etc. I know you can define custom types (unions etc) but idk if it starts getting a bit unmanageable
It does make the code itself a bit more verbose (though fewer docs will be required)

Good point, I usually hope that the message is clear enough, but custom exceptions are even better, of course.

Yeah I just think it can be helpful for devs if they want to catch particular PyPortfolioOpt issues - though to be fair I can't think of a usecase where catching these InstantiationErrors would be valid (unlike OptimizationError).

In any case, I'm hoping to push v1.5.0 this weekend with all these changes (along with the other open PRs).

phschiele commented 3 years ago

After some getting used to it, typing has saved me a lot of headaches on larger projects.

Any aspects in particular? It seems like it might make the library more "self-documenting" (since we could remove type annotations from the docstrings presumably) but I'm worried about the following:

This library uses a lot of ambiguous types, e.g you can provide either a dataframe or a numpy array etc. I know you can define custom types (unions etc) but idk if it starts getting a bit unmanageable

It does make the code itself a bit more verbose (though fewer docs will be required)

Mostly small things, e.g. if the output of a function is changed, I immediately get feedback from the IDE if that breaks something. But there are other benefits as well, like when setting up a new project, one can create a skeleton and by adding type hints define "interfaces", which helps a lot when parallelizing the tasks in a team. I see your points, perhaps only a small comment:

In some cases, it makes even clearer what can be passed to a function, e.g. instead of adding types like Union[List,Set,...] adding Iterable makes clear which property of an argument is required.
Future Python versions (starting with 3.10) will support nicer Syntax for ambiguous types (e.g. np.array | pd.DataFrame), but that's of course a long way to go.

Good point, I usually hope that the message is clear enough, but custom exceptions are even better, of course.

Yeah I just think it can be helpful for devs if they want to catch particular PyPortfolioOpt issues - though to be fair I can't think of a usecase where catching these InstantiationErrors would be valid (unlike OptimizationError).

:+1:

In any case, I'm hoping to push v1.5.0 this weekend with all these changes (along with the other open PRs).

:tada:

robertmartin8 commented 3 years ago

Will leave the typing in for now!

Ugh really wanted to get this PR merged this weekend, but I made the mistake of upgrading to macOS Big Sur and am now having install issues with cvxpy lmao. Will hopefully get that fixed and have this pushed by midweek.

robertmartin8 / PyPortfolioOpt

Parameterized CVXPY problems #335